Considerations For Using Sparse Join Indexes With Dense NUPIs
Selecting the primary index for a sparse join index that has thousands of rows per
value, with each AMP controlling some percentage of these values, provides several
benefits and carries few of the negatives associated with a high number of duplicate
primary index values.
During join index creation there is no duplicate row checking as there is with a base
table, so one of the principal reasons to avoid such high numbers of duplicates on
a primary index does not apply to the case of creating a join index.
The join itself can be more efficient with a higher numbers of NUPI duplicates because
when so many rows carry the same NUPI row-hash value, the physical I/O involved in
storing them can be less.
While balanced processing is always important in selecting a primary index for a base
table, the dense NUPI approach is appropriate for join indexes when it enables fast
query execution and replaces an all-AMP alternative that would process only a few
rows from each AMP.
On the other hand, it is not desirable to overload one AMP unduly, whether the access is single- or all-AMP. If
an inordinate number of data blocks would have to be processed by one AMP using the
dense NUPI approach, then parallelizing the work across all AMPs by selecting an alternative
primary index is probably a better choice for enhancing performance.
Designing a sparse join index to ensure that the number of distinct values in its
index primary index is greater than the number of AMPs in the system is a good strategy
to protect against too many queries being concentrated on too few AMPs. However, if
the queries are very short and are infrequent, that concern is less important.