16.20 - Distribution of Join Index Rows - Teradata Database - Teradata Vantage NewSQL Engine

Teradata Vantage™ SQL Data Definition Language Detailed Topics

Product
Teradata Database
Teradata Vantage NewSQL Engine
Release Number
16.20
Release Date
March 2019
Content Type
Programming Reference
Publication ID
B035-1184-162K
Language
English (United States)

With the exception of column-partitioned NoPI join indexes, the hash distribution of join index rows across the AMPs is controlled by the specified PRIMARY INDEX or PRIMARY AMP INDEX clause. For a description of how Teradata Database assigns the rows of column-partitioned NoPI database objects to the AMPs, see Teradata Vantage™ - Database Design, B035-1094.

For most join indexes, a primary index must be nonunique. UPIs are only permitted for single-table join indexes. If you do not define a primary index or primary AMP index explicitly, the first column defined for the join index is assigned as the primary index, but only if the column is not partitioned. A primary AMP index is always nonunique.

By default for a primary index, rows are sorted locally on each AMP by the hash code of the primary index column set. To sort by a single column of your choice, use the optional ORDER BY clause in the join index definition. With the ORDER BY clause, you can sort by raw data values rather than the default sort on the hash codes for the values.

Sorting a join index NUSI by data values, instead of hash codes, is especially useful for range queries that involve the sort key. For a comparison of value-ordered NUSIs versus row partitioning for join index range query support, see Choosing Between a Row-Partitioned Join Index and a Value-Ordered NUSI For Covering Range Queries.

Value-ordered NUSI storage provides better performance for queries that specify selection constraints on the value ordering column. NUSI value ordering is limited to a single four-byte numeric or DATE column.

For example, suppose a common task is to look up sales information by sales date. You can create a join index on the sales table and order it by sales date. The benefit is that queries that request sales by sales date only need to access those data blocks that contain the value or range of values that the queries specify.

In the following example, the join index rows are hash-distributed across AMPs using c_name as the nonunique primary index and are value-ordered on each AMP using c_custkey as the sort key:

    CREATE JOIN INDEX ord_cust_idx AS
    SELECT (o_custkey, c_name), (o_status, o_date, o_comment)
    FROM Orders
    LEFT JOIN Customer ON o_custkey = c_custkey
    ORDER BY o_custkey
    PRIMARY INDEX (c_name);

If the join index you are creating is not row compressed, then you can define a row partitioning for it by specifying one or more partitioning expressions, which also optimizes the access to the index for range queries on the partitioning column set. See Partitioned Tables and Join Indexes.