Load Data into or Export Data from the Teradata Database - Parallel Transporter

Teradata Parallel Transporter Application Programming Interface

Product
Parallel Transporter
Release Number
15.10
Language
English (United States)
Last Update
2018-10-07
dita:id
B035-2516
lifecycle
previous
Product Category
Teradata Tools and Utilities

Load Data into or Export Data from the Teradata Database

After establishing a connection, load or export data from the Teradata Database. Data can be loaded on a row by row basis using the PutRow function or as a block by using the PutBuffer function which is available to the Load, Update, and Stream drivers. Likewise, data can be exported one row at a time using the Export driver’s GetRow function or multiple rows can be exported together in a buffer using the Export driver’s GetBuffer function.

Data can be loaded into or exported from all of the columns in a table or from a subset of the columns in a table. When loading data into a subset of the columns in a table, default values must be specified for the columns not being loaded. In each case, only the columns being loaded or exported from should be defined in the schema.

For example, Table1 has columns A, B, and C. Columns A and B are defined as integers and column C is a variable character column with a length of 25. To load data into column A only, a valid insert statement would be

INSERT INTO Table1 (A, B, C) VALUES (:A, 0, NULL); 

where zero is the default value for B and NULL is the default value for column C. The input schema for this example would be defined as only having column A.

Row by Row Loading into the Teradata Database

There are three parts to loading data into the Teradata Database using Teradata PT:

  • Load rows
  • Inform Teradata PT that acquisition is complete
  • Apply rows to the Teradata Database
  • Data is loaded using the Connection object’s PutRow function. The PutRow function accepts as its arguments a pointer to a single row of data in nullindicator mode format, and the length of the data row (including the length of the null‑indicator bytes). See Figure 5 for an illustration of the indicator‑mode input data format.

    The PutRow length is the total length of all the column values (including the length bytes for variable‑length values) plus the number of indicator bytes. See Data Format for information on how to determine the number of indicator bytes in a row.

    while (!endOfData) {
    /* Assemble a row in null-indicator format */
       ...
     /* Pass a row to the Teradata Database */
     conn->PutRow(buffer, length);
    }

    Although the application passes a row at a time, the Load, Update, and Stream drivers accumulate the rows in the CLIv2 buffer and then send them to the DBS by the buffer load.

    Once all data has been loaded, the acquisition method is completed by calling the Connection object’s EndAcquisition function. This must be done before the data can be applied.

    conn->EndAcquisition( );

    When the acquisition method has been completed, the data is applied in the Teradata Database by using the Connection object’s ApplyRows method. This method should only be called when using the Load or Update driver. The Stream driver automatically applies the changes which means the ApplyRows method should not be called when using the Stream driver.

    conn->ApplyRows( );

    Load Rows into a Teradata Database using PutBuffer

    The Teradata PT PutBuffer function is available for the Load, Update and Stream drivers. Using PutBuffer for the Update and Stream drivers will only provide performance improvement if the function call overhead to PutRow is costly. If that is not the case, continue to use PutRow for the Update and Stream drivers.

    There are five steps to using this method:

    1 Set the buffer mode attribute

    2 Query the buffer layout

    3 Load the buffers

    4 Inform Teradata PT that acquisition is complete

    5 Apply rows to the database (except when using the Stream driver)

    Before Initiate is called, set the buffer mode attribute to Yes in the Connection object as follows:

    conn->AddAttribute(TD_BUFFER_MODE,"Yes");

    In order to buffer the data for block loading, query the Connection object using the event method to obtain the layout for the data buffer. The event method should be queried using the TD_Evt_BufferLayout parameter as follows:

    returnValue = conn->GetEvent(TD_Evt_BufferLayout, &dataptr, &datalen);

    The layout returned by the event method for the buffer event contains four 4‑byte unsigned integers corresponding to the maximum buffer size, the row header size, the row length size, and the buffer trailer size.

    Note: These layout values will change depending on the user environment and may be defined differently in future releases. Always obtain these values from the event method before buffering data.

    Figure 4 is a diagram representing the layout of the Load driver data buffer.

    Figure 4: Data Buffer Format

    When constructing the data buffer, the user is responsible for filling in the row length for each row and the row data. The row length is two bytes for Load and Update operator and it is four bytes for Stream driver. The beginning of the row header and the buffer trailer should be left blank as they will be filled in by Teradata PT. The total size of the buffer cannot exceed the maximum buffer size returned by the buffer layout event.

    Buffers of data are loaded using the Connection object’s PutBuffer method. The PutBuffer method accepts as its arguments a pointer to a buffer of data, the length of the data buffer, and an indicator informing Teradata PT whether the data in the buffer is in indicator mode (a value of one) or in non-indicator mode (a value of zero). Data for the Update and Stream drivers must be in indicator mode (a value of one).

    while (!endOfData) {
     
       /* Assemble a buffer of rows in null-indicator format */
       ...
       
       /* Pass a buffer to the Teradata Database */
       conn->PutBuffer(buffer, length, 1);
    }

    Once all data has been loaded, the acquisition method is completed by calling the Connection object’s EndAcquisition method. This must be done before the data can be applied.

    conn->EndAcquisition( );

    When the acquisition method has been completed, the data is applied in the Teradata Database by using the Connection object’s ApplyRows method. This is not needed when using the Stream driver.

    conn->ApplyRows( );

    Note: When using the Load, Stream, or Update drivers, the PutBuffer and PutRow features can not be used in the same job.

    Export Data from a Teradata Database using GetRow

    Data can be exported one row at a time using the Connection object’s GetRow function. The GetRow function accepts as its arguments a pointer to an allocated buffer with enough room for one row of data and a pointer to a TD_Length object for storing the length of the data exported.

    while ( returnValue != TD_END_Method ) {
    	
    	/* Retrieve row from the Teradata Database */
    	returnValue = conn->GetRow(&buffer, &length);
    }

    The GetRow function will return the TD_Success value when a row of data has been successfully retrieved. The TD_End_Method value will be returned when all rows have been retrieved and the data acquisition process is complete.

    Export Data from a Teradata Database using GetBuffer

    Multiple rows can be exported together in a buffer using the Connection object’s GetBuffer function. Exporting an entire buffer full of rows using the GetBuffer function rather than one row at a time using the GetRow function improves performance by reducing the amount of data movement required. The GetBuffer function accepts as its arguments a pointer to an unallocated character buffer and a pointer to a TD_Length variable. The GetBuffer function uses these two arguments to return a pointer to a data buffer and to return the total length of that data buffer.

    conn->GetBuffer(&buffer,&length);

    The GetBuffer function returns the TD_Success value when a buffer of data is successfully retrieved. The buffer of data returned by the GetBuffer function must be immediately either copied or processed. The buffer used to return the data is overwritten on the next GetBuffer function call, and the buffer is destroyed when the Connection object is deleted. The GetBuffer function returns the TD_END_Method value when all rows have been retrieved and the data acquisition process is complete.

    In order to use the GetBuffer feature, set the TD_BUFFER_MODE attribute to Yes (or Y) before initiating the Connection object. The GetRow function cannot be called if the GetBuffer feature is enabled.

    Specify the size and format of the data buffer returned by the GetBuffer function by setting one or more of the following buffer layout attributes:

  • TD_BUFFER_MAX_SIZE
  • TD_BUFFER_HEADER_SIZE
  • TD_BUFFER_LENGTH_SIZE
  • TD_BUFFER_TRAILER_SIZE
  • These attributes correspond to the total maximum size of the data buffer, the row header size allocated for each row in the data buffer, the row length size used for each row in the data buffer, and the trailer size allocated for the data buffer. See Figure 4 on page 29 for a description of the format of the data buffer and Table 27 for a detailed description of each of the four buffer layout attributes and their default values.

    The layout of the data buffer returned by GetBuffer can alternatively be set by using the Export driver’s TD_Evt_BufferLayout modifier along with the PutEvent function. The Export driver’s TD_Evt_BufferLayout modifier can be used to set the data buffer layout any time after the Export driver has been initialized up until the first call to the Export driver’s GetBuffer function. The modifier returns TD_Unavailable if called after the first call to the GetBuffer function. See “PutEvent Modifiers” on page 166 for more information on how to use the Export driver’s TD_Evt_BufferLayout modifier.

    Transfer Data Using GetBuffer and PutBuffer

    Data exported using the Connection object’s GetBuffer function can be directly loaded using the Connection object’s PutBuffer function. This feature improves performance when data transfer is needed and no data transformation is required.

    To directly transfer data from the Export driver’s GetBuffer function to the Load, Update, or Stream driver’s PutBuffer function, the Export driver’s buffer layout attributes must be set using the buffer layout values returned by the event method for the Load, Update, or Stream drivers. This causes the Export driver to format the data returned by the GetBuffer function so that it will match the buffer layout required by the Load, Update, or Stream driver’s PutBuffer function. The following steps describe how to accomplish this transfer.

    To transfer data using GetBuffer and PutBuffer functions (non‑dynamic schema transfer)

    1 Create a Load, Update, or Stream driver and enable the PutBuffer feature by setting the TD_BUFFER_MODE attribute to Yes (or Y).

    2 Initiate the Load, Update or Stream driver.

    3 Create an Export driver and enable the GetBuffer feature by setting the TD_BUFFER_MODE attribute to Yes (or Y).

    4 Use the GetEvent function to query the TD_Evt_BufferLayout event for the Load, Update, or Stream driver and use the buffer layout values returned by this event to set the following attributes for the Export driver:

  • TD_BUFFER_MAX_SIZE
  • TD_BUFFER_HEADER_SIZE
  • TD_BUFFER_LENGTH_SIZE
  • TD_BUFFER_TRAILER_SIZE
  • After completing these steps, the Export driver can be initiated and the data retrieved by the Export driver’s GetBuffer function can be directly loaded using the Load, Update, or Stream driver’s PutBuffer function.

    When using the Export driver’s dynamic feature and sharing the dynamic schema between drivers, the Export driver must be initiated first instead of last. In this situation, use the following alternative procedure.

     

    Transferring Data using the Dynamic Schema Feature with GetBuffer and PutBuffer Functions

    To implement the Export driver’s dynamic schema feature while using GetBuffer and PutBuffer to transfer data, follow these steps:

    1 Create an Export driver and enable the GetBuffer feature by setting the TD_BUFFER_MODE attribute to Yes (Y).

    2 Create a Load, Update, or Stream driver and enable the PutBuffer feature by setting the TD_BUFFER_MODE to Yes (Y).

    3 Initiate the Export driver.

    4 Initiate the Load, Update, or Stream driver.

    5 Use the GetEvent function to query the TD_Evt_BufferLayout event for the Load, Update, or Stream driver.

    6 Use the PutEvent function to pass the event data returned by the Load, Update, or Stream driver’s TD_Evt_BufferLayout event directly into the Export driver’s TD_Evt_BufferLayout modifier.

    At this point, data can be retrieved from the Export driver using the GetBuffer function and loaded directly using the Load, Update, or Stream driver’s PutBuffer function.

    Note: The buffer layout values for the Load, Update, or Stream drivers change depending on the user environment and may be defined differently in future releases. Always obtain these values from the event method when transferring data between the Export driver’s GetBuffer function and the Load, Update, and Stream drivers’ PutBuffer function.

    Data Format

    The Teradata PT PutRow and GetRow methods currently only support the indicator-mode input data format:

  • A variablelength indicator bytes field
  • A variablelength input data field
  • The positions of the indicator bits correspond to the fields of the loading and exporting row. There is one bit for each column in the table. The first bit in the first byte is the indicator for the first field in the record. If an indicator bit is set to one, the Teradata Database nulls the corresponding field when the record is loaded. If the indicator bit is set to zero, the Teradata Database loads the data specified for that field.

    For example, when there are less than 9 columns, there is a 1‑byte indicator‑byte length. Between 9 and 16 columns, there is a 2‑byte indicatorbyte length. There is a 3‑byte indicator‑byte length between 17 and 24 columns, etc.

    Figure 5: Bytes Illustrating Indicator-Mode Input Data Format

    Find more information about the indicator byte in Teradata FastLoad Reference. See “Additional Information” on page 5.

    When using the Teradata PT PutBuffer function with the Load driver, the input data can be in indicator mode or in non‑indicator mode format. When using the PutBuffer function with the Update or Stream driver, the input data must be in indicator mode format.