15.00 - Row Allocation for Teradata Parallel Data Pump - Teradata Database

Teradata Database Design

prodname
Teradata Database
vrm_release
15.00
category
User Guide
featnum
B035-1094-015K

Row Allocation for Teradata Parallel Data Pump

Because NoPI tables and column‑partitioned tables have no primary index, Teradata Database must build their rowhash value differently than it does for primary‑indexed rows, generating a random rowhash value in such a way that for a new request, data is usually sent to a different AMP than the AMP the previous request distributed its rows to. This method attempts to balance the distribution of rows across the AMPs as much as is possible without having a primary index to hash. The Query ID for the request is used for this purpose because it uses the PE vproc ID in its high digits and a counter-based value in its low digits, which supports both single‑session and multiple-sessions INSERT operations equally well.

Rows in a NoPI table or column‑partitioned table are not hashed based on a primary index, so Teradata Database can distribute them to any AMP as desired. To maximize this benefit, Teradata Parallel Data Pump array inserts on a NoPI table or column‑partitioned table always pack as many rows as there are in the request from the client into the same AMP step. The rows are sent to the same AMP in one insert step. The distribution of the rows in a Teradata Parallel Data Pump job that are to be inserted into a NoPI table or column‑partitioned table uses a random generator that determines the AMP destination for each insert step (see next paragraph). The random generator choose a different AMP from the one that the previous request distributed rows to, spreading the rows across the AMPs in a way that avoids skewing.

Teradata Database creates a random rowhash value to use for the distribution step, using the value of the query ID for the request as input to the randomization algorithm. The logical process involved in this assignment is described in the table following the graphic in the topic “Row Allocation for Primary‑Indexed Tables” on page 235.

Refer to the following graphic for a pictorial illustration of the row allocation process for Teradata Parallel Data Pump array inserts into NoPI and column‑partitioned table rows.

There are two cases for a simple insert operation.

  • Only a single data row is processed.
  • Multiple data rows are processed for a Teradata Parallel Data Pump array INSERT request.
  • For both cases, the system copies the randomly generated rowhash value into the rowID of each row being processed as soon as it has been built. A randomly generated rowhash value is generated once per request so that in the case of a Teradata Parallel Data Pump array insert, the same rowhash value is used for all of the rows in the request. That means that Teradata Database sends all rows for that request to the same AMP, possibly in the same step.

    The source spool for rows that are inserted into NoPI tables and column‑partitioned tables by INSERT … SELECT operations are appended AMP‑locally to their target table, so there is no need to hash them to a destination AMP.

    You should consider the following recommendations for Teradata Parallel Data Pump array inserts into nonpartitioned and column‑partitioned tables.

  • If the order of data application is not important, set the SERIALIZE option to OFF.
  • Do not set the number of sessions for a Teradata Parallel Data Pump job to a larger value than the number of AMPs on the system.