Primary Index Value Retrieval Access: Guidelines for Nonpartitioned Selection
Keeping in mind that the principle goal for selecting the primary index for a table should always be achieving an even distribution of rows across the AMPs, the primary guideline for selecting a primary index to optimize retrieval should be based on the access demographics of the table. To facilitate optimal row access, choose a single column or, less preferably, a set of several columns, that is most frequently used to access the table. In other words, define the primary index for a table on a column set that is most frequently equated to discrete values in WHERE clause predicates in your application environment.
The reason for defining the primary index on the smallest possible column set is that you cannot hash or retrieve on a partial index value, so if a query condition specifies only a subset of the primary index column set, the Optimizer cannot build an access plan that uses the primary index. Note that this is also true for USIs.
Conversely, if you specify too few columns in the primary index definition, then each primary index value might correspond to a large number of rows, a situation that often not only causes data skew, but also degrades any data maintenance that must touch all rows in a row hash.
The goal of this guideline is to maximize the number of single-AMP (primary index only) operations.