15.00 - Rowhash Value and RowID for NoPI Tables - Teradata Database

Teradata Database Design

Teradata Database
User Guide

Rowhash Value and RowID for NoPI Tables

Because neither nonpartitioned nor column‑partitioned NoPI tables have a primary index on which to base the rowhash value for a row, Teradata Database generates a random rowhash value for both nonpartitioned and column‑partitioned NoPI table rows based on different values, depending on whether the row is inserted using a simple insert (including array inserts) or an INSERT-SELECT. You can also load nonpartitioned NoPI tables, but not column‑partitioned tables, using the Fast Load utility, which also generates the rowhash for such tables differently (see “Row Allocation for FastLoad Operations Into Nonpartitioned NoPI Tables” on page 238).

For simple SQL insert operations, Teradata Database uses the system‑generated value of the Query ID to uniquely identify requests as input to the hashing algorithm to determine its destination AMP. The rowhash value generated by the hashing algorithm ensures that rows are ordinarily sent to a different AMP from the AMP selected for the previous request, which balances the distribution of rows among the AMPs as much as is possible without hashing on a primary index value. The randomly generated hash bucket values are as nonunique as possible to avoid excessive skewing of the row distribution. See “Row Allocation for Teradata Parallel Data Pump” on page 237 for details of this operation.

For simple insert operations, there are two possible cases.

  • A single row is to be dispatched.
  • This is the case for a standard INSERT operation.

  • Multiple rows are to be dispatched.
  • This is the case for the Array INSERT operations used by the Teradata Parallel Data Pump utility.

    For both cases, the system copies the generated rowhash value into the rowID of each row (see “Base Table Row Format” on page 740). Because the system uses the generated rowhash value only once per request, Teradata Database copies the same rowhash value into the rowID of all the rows processed by an Array INSERT operation, which means they are all sent to the same AMP, very possibly in the same step.

    For a column‑partitioned or nonpartitioned NoPI table, the higher order 20 bits of the hash value is a hash bucket that specifies the AMP for a referencing rowID. The system selects the hash bucket in increasing order as one defined for the AMP per the NoPI hash map on which the row is inserted.

    Teradata Database assigns rows randomly to AMPs or copies them locally. The hash bucket is not computed using a hash of the primary index columns because there is no primary index to hash. Because the full 32‑bit hash value is not used for a NoPI database object, the uniqueness value is 44 bits and is used as a row number, beginning at one and incrementing by one for each row appended to that partition and hash bucket.

    A 44‑bit uniqueness value enables a maximum of 17,592,186,044,415 rows per hash bucket. If the maximum for a hash bucket is exceeded, Teradata Database sets the hash bucket bits to the next bucket number for the AMP per the NoPI hash map, and resets the uniqueness value to 1. If the table or join index is not partitioned, the internal partition number is 0.

    For a column‑partitioned database object, the rowID is the same as for a NoPI table except that the object is column partitioned and, optionally, row partitioned. Therefore, the internal partition number is nonzero.

    Join indexes, hash indexes, and secondary indexes have referencing rowIDs whether there is a primary index or not, or whether the underlying table or join index is partitioned or not. For a rowID that references a column‑partitioned object, the column partition number (within the internal partition number) is set to 1 by convention and for consistency. A referencing rowID indicates a specific table row, not necessarily to a specific physical row. A rowID can easily be adjusted by adding a constant delta on dereferencing to a desired column partition number in order to access a physical row containing a specific column partition value for the table row.

    The rowID for a table row of a NoPI table without column partitioning is as follows.

       Internal Partition Number = 0, Hash Bucket = x, Uniqueness = n

    The rowID for a table row of a CP table is as follows.

       Internal Partition Number = 1, Hash Bucket = x, Uniqueness = n

    The Dispatcher distributes nonpartitioned and column‑partitioned NoPI table rows from an external source to the AMPs based on a randomly generated rowhash value. This means that if there is a large number of rows to distribute, their assignment eventually balances among the AMPs.

    If the source table data for an INSERT … SELECT statement is skewed, the target table data is skewed when you are loading an nonpartitioned NoPI or column‑partitioned table. This is because the rows selected by an INSERT … SELECT request are not redistributed across the AMPs, but are copied locally to the same AMP. To avoid this problem, you can specify an appropriate HASH BY clause with the SELECT subquery (see SQL Data Manipulation Language for information about the syntax and usage of the HASH BY clause in a SELECT request).

    Unlike rows loaded into an nonpartitioned NoPI table using Teradata Parallel Data Pump array inserts, rows loaded into an nonpartitioned NoPI table using the FastLoad utility are not parsed, but are instead sent directly to an AMP. Such rows are then hashed by the AMP software on their AMP vproc ID plus a counter value using a randomization algorithm that is different from the standard hashing algorithm. This generates the final rowhash value used for distribution. The receiving AMP determines the hash bucket and uniqueness value to be used as part of its rowID.

    Note: You cannot use the FastLoad utility to load rows directly into a column‑partitioned table. You can FastLoad rows into a staging table and them use an INSERT … SELECT statement to transfer them into the column‑partitioned table.