Distribution of Join Index Rows - Advanced SQL Engine - Teradata Database

SQL Data Definition Language Detailed Topics

Advanced SQL Engine
Teradata Database
Release Number
July 2021
English (United States)
Last Update
Product Category
Teradata Vantage™

With the exception of column-partitioned NoPI join indexes, the hash distribution of join index rows across the AMPs is controlled by the specified PRIMARY INDEX or PRIMARY AMP INDEX clause. For a description of how Vantage assigns the rows of column-partitioned NoPI database objects to the AMPs, see Teradata Vantage™ - Database Design, B035-1094.

For most join indexes, a primary index must be nonunique. UPIs are only permitted for single-table join indexes. If you do not define a primary index or primary AMP index explicitly, the first column defined for the join index is assigned as the primary index, but only if the column is not partitioned. A primary AMP index is always nonunique.

By default, for a primary index, rows are sorted locally on each AMP by the hash code of the primary index column set. To sort by a single column of your choice, use the optional ORDER BY clause in the join index definition. With the ORDER BY clause, you can sort by raw data values rather than the default sort on the hash codes for the values.

Sorting a join index NUSI by data values, instead of hash codes, is especially useful for range queries that involve the sort key. For a comparison of value-ordered NUSIs versus row partitioning for join index range query support, see Choosing Between a Row-Partitioned Join Index and a Value-Ordered NUSI For Covering Range Queries.

Value-ordered NUSI storage provides better performance for queries that specify selection constraints on the value ordering column. NUSI value ordering is limited to a single four-byte numeric or DATE column.

For example, suppose a common task is to look up sales information by sales date. You can create a join index on the sales table and order it by sales date. The benefit is that queries that request sales by sales date only need to access those data blocks that contain the value or range of values that the queries specify.

In the following example, the join index rows are hash-distributed across AMPs using c_name as the nonunique primary index and are value-ordered on each AMP using c_custkey as the sort key:

SELECT (o_custkey, c_name), (o_status, o_date, o_comment)
FROM Orders
LEFT JOIN Customer ON o_custkey = c_custkey
ORDER BY o_custkey

If the join index you are creating is not row compressed, then you can define a row partitioning for it by specifying one or more partitioning expressions, which also optimizes the access to the index for range queries on the partitioning column set. See Partitioned Tables and Join Indexes.