15.00 - Distribution of Join Index Rows - Teradata Database

Teradata Database SQL Data Definition Language Detailed Topics

Product
Teradata Database
Release Number
15.00
Content Type
Programming Reference
Publication ID
B035-1184-015K
Language
English (United States)

Distribution of Join Index Rows

With the exception of column‑partitioned join indexes, the hash distribution of join index rows across the AMPs is controlled by the specified PRIMARY INDEX clause. This topic does not describe how Teradata Database assigns the rows of column‑partitioned database objects to the AMPs. See Database Design for explanations of the various ways that Teradata Database assigns the rows of column‑partitioned database objects to the AMPs.

Note that the primary index for most join indexes must be non‑unique: UPIs are only permitted for single‑table join indexes. If you do not define a primary index explicitly, then the first column defined for the join index is assigned to be its primary index.

By default, rows are sorted locally on each AMP by the hash code of the primary index column set. To sort by a single column of your choice, use the optional ORDER BY clause in the join index definition. With the ORDER BY clause, you can sort by raw data values rather than the default sort on the hash codes for the values.

Sorting a join index NUSI by data values, as opposed to hash codes, is especially useful for range queries that involve the sort key (see “Guidelines for Choosing Between a Partitioned Join Index and a Value‑Ordered NUSI For Covering Range Queries With a Join Index” on page 353 for a comparison of value‑ordered NUSIs versus SLPPIs for join index range query support).

Value‑ordered NUSI storage provides better performance for queries that specify selection constraints on the value ordering column. NUSI value ordering is limited to a single four-byte numeric or DATE column.

For example, suppose a common task is to look up sales information by sales date. You can create a join index on the sales table and order it by sales date. The benefit is that queries that request sales by sales date only need to access those data blocks that contain the value or range of values that the queries specify.

In the following example, the join index rows are hash‑distributed across AMPs using c_name as the non‑unique primary index and are value-ordered on each AMP using c_custkey as the sort key:

    CREATE JOIN INDEX ord_cust_idx AS
    SELECT (o_custkey, c_name), (o_status, o_date, o_comment)
    FROM Orders 
    LEFT JOIN Customer ON o_custkey = c_custkey
    ORDER BY o_custkey
    PRIMARY INDEX (c_name);

If the join index you are creating is not row compressed, then you can define a PPI for it by specifying one or more partitioning expressions, which also optimizes the access to the index for range queries on the partitioning column set (see “Partitioned Tables and Join Indexes” on page 343).