- Uniform data distribution (the most important consideration)
- Optimal access to the data
- The volatility of indexed column values
- Select columns that consist mainly of unique, distinct values.
This is the most important consideration. Columns with distinct values distribute data rows evenly across all AMPs in the configuration. This maximizes parallel processing by allowing all the processors to participate in processing the data rows for a request.
Avoid using columns with a small number of distinct values that are repeated frequently or columns that have many nulls. This will cause uneven distribution of data rows resulting in some AMPs having more rows to process than others, and will increase the potential for performance bottlenecks.
Recommendation: Select columns having significantly more distinct values than the number of AMPs in the configuration.
- Select columns that are most frequently used in equality predicate conditions.
- Select columns that are most frequently used to access rows.
The PI provides the most efficient method to access data in a table. Therefore, choosing the PI on the most frequently used access path provides for optimal data access. For example, if the table is frequently joined with a specific set of tables, consider defining the PI on the column set that is typically used as the join condition.
Equality conditions permit direct access to the row. Inequality conditions require additional processing.
- Select columns that do not have any of the following data types: BLOB, CLOB, Period, UDT, ST_Geometry, MBR.
- Choose as few columns as possible for the PI to optimize its generality.
- Select primary indexes so that query plans use AMP-local processing and avoid row redistributions.
- Select columns that are not volatile.
Volatile columns force frequent row redistribution because Teradata Database assigns table rows to AMPs based on the PI values.Several DBSControl values, including those for AMP buffer size, affect actual row distribution.
Recommendation: Select a PI column set that will never be (preferred) or rarely be updated.
You may encounter conflicts when selecting the optimum primary index for a table. For example, choosing a PI that gives good distribution of data across the AMPs must be balanced against choosing a PI that reflects the most common usage pattern of the table. Also, some applications might favor one type of primary index, while other applications might perform optimally using a different primary index.
Recommendation: Make even data distribution, not data access patterns, the main priority when selecting a primary index. Tables and queries change. Access patterns change. However, data distribution does not. You can always add additional non-primary indexes, such as secondary and join indexes, to facilitate particular applications.
|Topic||Resources for Further Information|
|No primary index tables||Teradata Vantage™ - Database Design, B035-1094|
|Referential integrity||Teradata Vantage™ SQL Fundamentals, B035-1141|