15.00 - Inserting Rows Into a Column-Partitioned Table Using Teradata Parallel Data Pump - Teradata Database

Teradata Database Design

prodname
Teradata Database
vrm_release
15.00
category
User Guide
featnum
B035-1094-015K

Inserting Rows Into a Column‑Partitioned Table Using Teradata Parallel Data Pump

Teradata Parallel Data Pump array INSERT operations use a specialized SQL client‑server protocol that enables multiple data parcels to be passed in a single request. When Teradata Database detects the presence of multiple data parcels in a request, it iterates the execution of that request for each data parcel by binding its user data references on each iteration to the data record passed in the corresponding data parcel.

There are two main performance benefits from using Teradata Parallel Data Pump array INSERT operations.

  • Request text size
  • The request text in an iterated request such as a Teradata Parallel Data Pump array INSERT request specifies one single instance of an INSERT statement, so it is independent of the number of times the INSERT operation is iterated.

    When you preface a DML request with a USING request modifier, it only specifies the columns that are referenced by a single iteration. Therefore, the size of the request text for an iterated request can be greatly reduced when compared to an equivalent multistatement INSERT request. The smaller request text reduces the cost of building the request from the client and sending it to the server. The smaller request text also allows more data parcels per request and requires less cache space in the server and helps reduce the cost of searching the cache.

  • Multirow INSERT
  • For a Teradata Parallel Data Pump array INSERT request, Teradata Database can group multiple rows together by AMP ownership into a single AMP step. The processing of the rows is optimized by performing the inserts in the same step instead of multiple steps. This optimizes the following operations.

  • Reduces the CPU path because fewer steps are generated and sent or received between the PE and the AMP.
  • Reduces I/O because more rows can be inserted with one single I/O call.
  • The performance impact is directly dependent on the number of rows that can be grouped together in the same AMP step, which depends on the following factors.

  • The total number of data records in the request
  • A higher PACK factor causes there to be more rows in a request, which in turn leads to an increase in the average number of rows in the same step.

  • The system configuration, particularly the number of AMPs
  • Smaller systems with fewer AMPs increase the average number of rows in the same step with the same PACK factor. As the number of AMPs increases, the average goes down.

  • The clustering of inserts with same NUPI values
  • Inserts into NUPI table with many duplicate NUPI values make the average increase because rows with the same NUPI value all go to the same AMP.

    Because rows in both NoPI tables and column‑partitioned tables are not hashed based on a primary index, they can be dispatched to any AMP as desired. To maximize this benefit, a Teradata Parallel Data Pump array INSERT on a nonpartitioned NoPI table or a column‑partitioned table always packs as many rows as there are in the request from the client into the same AMP step. These rows are sent to the same AMP in one INSERT step. This is independent of the system configuration and the clustering of data.

    The sending of the rows in a Teradata Parallel Data Pump array INSERT job that are to be inserted into either a NoPI table or a column‑partitioned table uses a random generator that determines the destination AMP for each INSERT step. The random generator is designed to choose a different AMP from the one that the previous request sent data to ensure that data is spread across the AMPs as equally as it can be to avoid skew.

    With a constant PACK factor, the number of rows that are packed into the same AMP step on a primary‑indexed table depends on the number of AMPs in the system and how the data is clustered. As the number of AMPs increases and the clustering of data decreases, the number of rows that are packed into the same AMP step decreases.

     

    FOR a table with this kind of primary index …

    Data clustering …

    unique

    is generally minimal.

    non‑unique

    can be high, in which case the number of rows that are packed into the same AMP step is also high and is independent of the number of AMPs in the system.

    Therefore, a Teradata Parallel Data Pump array INSERT has the most benefit when comparing performance on NoPI or column‑partitioned tables with a UPI table on large systems with many AMPs. A Teradata Parallel Data Pump array INSERT does not have as much benefit when comparing performance on a NoPI or column‑partitioned table against the performance of a NUPI table with high data clustering.

     

    IF a column‑partitioned table has …

    THEN …

    row partitioning

    1 the rows to be inserted must be sorted in memory.

    2 the rows must be split into column partition values that are appended to the end of the internal combined partitions to which they belong.

    does not have row partitioning

    any improvement in performance can be offset by the processing that is required to do the following.

    1 Split the rows to be inserted into column partition values.

    2 Append those values to the ends of the corresponding column partitions.

    The performance of transforming rows into columns and appending column partition values is affected by the following factors.

  • Whether there are at least as many available column partition contexts as column partitions.
  • Remember that the Optimizer uses the setting of the DBS Control parameter PPICacheThrP (or the cost profile constant PPICacheThrP field) to determine the number of available column partition contexts that can be used at one time to append column partition values to their column partitions.

  • The number of column partition values inserted into a combined partition at one time.
  • The more column partition values inserted, the better.