15.10 - Advantages of Hashing Over Indexing - Teradata Database

Teradata Database Design

Teradata Database
User Guide

Advantages of Hashing Over Indexing

Why use hashing instead of indexing? The following list of advantages explains why hashing is superior to indexing for the Teradata parallel architecture.

  • For the majority of queries, hashing provides consistently better performance because rows are always distributed evenly across the AMPs, providing a consistently balanced workload across parallel units. Because Teradata Database uses B*-trees on a fixed‑length rowID rather than on the column values themselves, there are never more than two levels of the B*-tree, so there is no need to reorganize because the trees never become unbalanced and unordered. This is true for standard operation. System reconfiguration is necessary whenever you add nodes or AMPs to your system.
  • The Teradata file system B*-tree is implemented on the master index, cylinder index, and data block descriptor structures (see “Master Index” on page 194, “Cylinder Index” on page 195, and “Data Block” on page 196).

    B+ trees on variable‑length values can become unbalanced and then require a lengthy reorganization.

  • Primary indexes or primary AMP indexes are not stored in an index subtable. Instead, the rowid for an individual row is stored directly as part of the row header for the row. This access to data offers the following advantages over indexing of the same data.
  • Enhanced performance because of fewer I/Os
  • Decreased storage overhead because there are no primary index or primary AMP index subtables
  • Hashing is far better suited for the parallel database architecture pioneered by Teradata than is indexing. Because each table in a Teradata database is distributed horizontally across the AMPs on a row-by-row basis, the storage and retrieval algorithms and indexes used by standard file systems for storing and retrieving data are not as efficient as the distributed parallel Teradata architecture.
  • Because of its hashed data placement and parallel architecture, Teradata Database requires only a one-time decision on the primary index or primary AMP index for each table: the system takes care of everything else.

    Compare this with other relational products where a DBA must perform tasks such as determining keys and locations and coding space allocation for each partition before even beginning to create a table. Then the DBA must embed the partitioning assignments into the CREATE TABLE statement and forever after monitor and maintain partition sizes and perform periodic table reorganizations.

  • By carefully defining primary index or primary AMP index columns on frequently used join constraints, you can force rows that are frequently joined to be colocated on the same AMP. This is critical to join performance, because rows must be on the same AMP if they are to be joined. By colocating join‑constrained rows, you avoid the need to redistribute the rows from one or more tables across the BYNET to the AMP where they will be joined with rows located on that AMP.