Other Applications of Hashing for Parallel Processing
Teradata Database employs row hashing for tasks beyond simple row distribution and retrieval.
Several types of join processing hash on join column values to determine which AMP is to handle the join of the particular rows. This enables Teradata Database to balance the join load across all AMPs, with each doing an even subset of the total work.
Similarly, the row hash match scan method sorts rows to be joined on their row hash values, then scans the joined tables in parallel to permit the scan of one of the tables to skip ahead in its hash scan to the row hash where the second table is already positioned, as indicated by the following graphic:
In this example, Table A reads and joins row 1 to row 1 of Table B. Table A then obtains the row hash value of the next row in Table B, row 9, and joins its row 9 to row 9 of Table B, which has a matching row hash value. Unmatched rows are skipped without being read by hashing to the next highest candidate value for which a join might be possible. This processing of candidate joined rows can shorten the time required to perform this join significantly. See SQL Request and Transaction Processing for more information about this join method.
Aggregate processing also takes advantage of hashing to build its totals. For example, the following graphic indicates how each AMP aggregates its rows locally as the parallel first step in the global parallel aggregation process:
Following that, the fields in the GROUP BY key are hashed, and the resulting hash bucket for each distinct value points to the AMP responsible for building the global aggregate for that piece of the aggregate.
The illustration shows only 1 AMP per node for simplicity of presentation. A real Teradata system typically has multiple AMPs per node, and each would be involved in performing its own role in producing the global aggregate totals.