When choosing the primary index for a table, there are two essential factors to keep in mind: uniform distribution of the data and optimal access.
With respect to uniform data distribution, consider the following factors:
- The more distinct the primary index values, the better.
- Rows having the same primary index value are distributed to the same AMP.
- Parallel processing is more efficient when table rows are distributed evenly across the AMPs.
With respect to optimal data access, consider the following factors:
- Choose the primary index on the most frequently used access path.For example
- If rows are generally accessed by a range query, consider defining a PPI on the table that creates a useful set of partitions.
- If the table is frequently joined with a specific set of tables, consider defining the primary index on the column set that is typically used as the join condition.
- Primary index operations must provide the full primary index value.
- Primary index retrievals on a single value are always one-AMP operations.
Although it is true that the columns you choose to be the primary index for a table are often the same columns that define the primary key, it is also true that primary indexes often comprise fields that are neither unique nor components of the primary key for the table.