16.10 - Access Demographics - Teradata Database

Teradata Database Design

Product
Teradata Database
Release Number
16.10
Release Date
June 2017
Content Type
User Guide
Publication ID
B035-1094-161K
Language
English (United States)

Access demographics is composed of:

  • Primary index value retrieval
  • Join access

Primary Index Value Retrieval Access: Definition

An access column is one that is commonly used as a valued predicate in a WHERE clause.

While it is true that the column set chosen to be the primary index for a table is often the same column set that defines the primary key, it is also true that primary indexes are often composed of fields that are neither unique nor components of the primary key for the table.

Primary Index Value Retrieval Access: Guidelines for Nonpartitioned Selection

Keeping in mind that the principle goal for selecting the primary index for a table should always be achieving an even distribution of rows across the AMPs, the primary guideline for selecting a primary index to optimize retrieval should be based on the access demographics of the table. To facilitate optimal row access, choose a single column or, less preferably, a set of several columns, that is most frequently used to access the table. In other words, define the primary index for a table on a column set that is most frequently equated to discrete values in WHERE clause predicates in your application environment.

The reason for defining the primary index on the smallest possible column set is that you cannot hash or retrieve on a partial index value, so if a query condition specifies only a subset of the primary index column set, the Optimizer cannot build an access plan that uses the primary index. Note that this is also true for USIs.

Conversely, if you specify too few columns in the primary index definition, then each primary index value might correspond to a large number of rows, a situation that often not only causes data skew, but also degrades any data maintenance that must touch all rows in a row hash.

The goal of this guideline is to maximize the number of single-AMP (primary index only) operations.

Primary Index Value Retrieval Access: Partitioning Guidelines for Selection

The same guideline applies to partitioned access. The principal reason to define a table or uncompressed join index with row partitioning is to facilitate row partition elimination. Row partition elimination is analogous to how the Optimizer uses column projection and row restriction in that it eliminates partitions that are not relevant at the earliest possible stage in query processing.

The degree of row partition elimination depends both on the partitioning expression specified for the index and on the conditions specified in the query. Increasing the number of populated row partitions can degrade the performance of primary index access, joins, and aggregations on the primary index, but it also permits finer row partition elimination; therefore, it is critical to understand the nature of the applications that are the predominant users of a row-partitioned table or uncompressed join index.

It is not always necessary for all values of the partitioning columns to be specified in a query for row partition elimination to occur (see Row-Partitioned and Nonpartitioned Primary Index Access for Typical Operations for details about the relative performance implications of various access methods on tables with partitioned or nonpartitioned primary indexes).

A PPI provides optimal access to base table or join index rows while also providing efficient join and aggregation strategies on the primary index in other situations. With a constraint on the partitioning columns of a table or uncompressed join index, partitioned access performance can approach the performance of a nonpartitioned table depending on the degree to which the Optimizer can eliminate partitions from consideration by the query.

Access via an equality constraint on a primary index that also includes all the partitioning columns is as efficient as with a nonpartitioned table. If there is an equality constraint on a primary index that does not include all the partitioning columns, but there is an equality or other constraint on the partitioning columns that limits access to a single partition, then access is also as efficient as with a nonpartitioned table.

Access via an equality constraint on the primary index that neither includes the partitioning columns nor constrains the partitioning columns, might not be quite as efficient as with a nonpartitioned table, depending on the number of populated partitions (a partition is said to be populated when it contains rows).

Although you can narrow access by specifying a particular partition set using the PARTITION keyword, access to particular partitions is generally performed internally and need not be specified explicitly in a request.

IF a request … THEN …
specifies values for all primary index and partitioning columns a row can be retrieved by single AMP access from a single partition.
specifies values for all primary index columns and also specifies search conditions on the partitioning columns row partition elimination can reduce the number of combined partitions that must be probed on a particular AMP.
specifies values for all primary index columns but does not specify search conditions on the partitioning columns each combined partition can be probed individually to locate rows based on their hash value only.
specifies search conditions on the partitioning columns row partition elimination can reduce an all-AMPs full table scan to an all-AMPs scan of only the combined partitions relevant to the query.
does not specify the values for all primary index columns and there are no constraints on the partitioning columns the strategy the Optimizer elects to follow depends on whether a usable secondary, hash, or join index exists for the query, as explained by the following bullets.
  • If there is a cost effective secondary, hash, or join index, the Optimizer uses it.
  • If there is no cost effective secondary, hash, or join index, the Optimizer invokes an all-AMPs full-table scan.

About Join Access

Indexes are an extremely important component of any join. This does not apply to global temporary trace tables, which can neither have indexes nor be joined to other tables. See “CREATE GLOBAL TEMPORARY TRACE TABLE” in SQL Data Definition Language - Syntax and Examples, B035-1144 for details.

If, after implementing primary indexes for tables that are frequently joined, you detect significant performance issues when those tables are joined, you might want to reconsider your index column choices, particularly for commonly joined column sets in large tables. You might also want to consider adding additional columns to an existing index, depending on what your EXPLAINs for the query look like.

Join Access: Column Selection Guidelines

When designing for join access, you should first consider selecting common join columns for use as primary indexes. Joining tables on their primary indexes permits the Optimizer to specify hash or merge joins, two highly effective techniques, in the join plan to further optimize a query.

The effectiveness of a merge join relates directly to whether it is made on primary indexes or not. Possible scenarios, presented in order of optimum performance from best to worst, are as follows:

Join Predicate Redistribution Action Required
Tables joined on their primary indexes (and partitioning columns, if there are any). None.
Tables joined on their primary indexes, but not on all partitioning columns. Direct join in some cases, otherwise redistribute and sort the rows to be joined.
One join column is a primary index; the other is not. Qualified rows from one table must be redistributed and sorted.
Neither join column is a primary index. Qualified rows from both tables must be redistributed and sorted.

For multicolumn joins, you should consider using all, or at least a subset, of the join columns as the primary index.

Analogously to designing for row access, the primary guideline for selecting a primary index for join access is to choose a column or, less preferably, a set of columns, that is most frequently used to make the join. In other words, define the primary indexes for the tables to be joined on a column set that is most frequently equated to discrete values in WHERE or ON clause predicates in your application environment.

If you define a primary index on too many columns, then those queries that do not specify all of the columns in the index are not assigned a hash or merge join by the Optimizer.

If you define a partitioned primary index with too large a number of populated partitions, join access performance can be degraded.