Before you create a join index to optimize one or more of your frequently repeated queries, you should consider the following issues carefully:
- Cost of disk resources required to store a join index
- Cost of creating a join index
- Cost of maintaining a join index
- Benefits of reduced performance times for the queries the join index is designed to facilitate.
Cost is expressed as follows:
- Disk resources: Cost is expressed in bytes (space or size). Join indexes are space-intensive because they require separate tables, often of fairly high cardinality, for storage.
- Creation and maintenance: cost is expressed in time (elapsed time and CPU path per transaction).
Elapsed time is a fairly simple measure, but it is not comparable across different system configurations, nor is it independent of other workloads running on a system. It has the advantages of conceptual simplicity and ease of measurement. With respect to join index creation, it is a measure of how long it takes to create the index. With respect to maintenance, it is a measure of how long it takes to update a join index table beyond the time required to perform the update of the base table. In this sense, the term update refers generically to the data manipulation operations insert, update, and delete performed by any of the following SQL statements:
CPU path per transaction is a sophisticated measure that is comparable across different system configurations. It has the disadvantages of being both difficult to measure without the proper tools and difficult to understand conceptually.
Whether creation and maintenance cost is measured as elapsed time or as CPU path per transaction, the same proportional relationship holds: the smaller the number, the better.
The total cost for a join index is the sum of its creation and maintenance costs.
Join Index Costs Summary
Creation and maintenance costs for join indexes can be a resource burden because of their processing overhead. In the case of join index maintenance, the burden is an ongoing process that lasts for the life of the index.
Costs vary considerably among the various types of joins used as well as between simple and aggregate types. The following bulleted list summarizes the conclusions to be drawn from the performance analyses performed.
- Maintenance overhead for in-place aggregate join indexes ranges from 1.0 to 2.9.
- Maintenance overhead for aggregate join indexes that redistribute rows ranges from 2.6 to 9.5.
- Maintenance overhead for simple join indexes ranges from 1.0 to 23.1, increasing as the number of hits per data block increases.
- Maintenance overhead for aggregate join indexes decreases as the number of hits per data block increases because of the efficiencies of block-at-a-time operations in the file system.
- Maintenance overhead for insert operations is significantly less than the maintenance overhead required for update and delete operations.
Benefit and Benefit Percentage
Benefit is a measure of the difference in elapsed time for the same query performed with and without a join index.
Benefit percentage is a normalized expression of benefit. It is defined as the product of benefit and 100 divided by the elapsed query time without a join index.
The larger the number, the better.
Payback is an expression of the number of queries that must be run to achieve a break even point for the total cost of the join index. Its value is derived from dividing total costs by benefit.
The smaller the number, the better.
Cost of Disk Resources
A join index table has the same properties as a base table except that it cannot be queried directly.
Because it is essentially a base table, the cardinality of a simple join index is generally of the same order as its component join table with the highest cardinality, with adjustments being necessary for row compression and other miscellaneous issues. The cardinality of an aggregate join index is typically much smaller than that of any of its component join tables.
The degree of either type of join index table is entirely dependent on how it is defined.
Irrespective of any other factors, join index tables extract a cost in terms of increased disk space requirements. If a join index table is fallback-protected, then it exerts twice again the burden in disk space.
The disk resources required for a join index are described in Join Index Storage.
Cost of Join Index Creation
The cost of creating a join index is described in terms of time: how long does it take to create the index? Equally important in many situations is the question of how many times a query for which a join index was designed must be performed to offset the creation time by recovering response time.
Different types of join index require a longer or shorter time to create.
If your processing environment requires that join indexes be created and dropped on a regular basis, the cost of creating the index becomes a critical factor that must be considered carefully in determining its worth.
Because of this creation overhead, you should evaluate the benefit of a join index created to enhance your standard queries vis-a-vis its creation cost if you must drop and create your join indexes frequently. If the gross effect of an index is negative with respect to overall system performance, you might not want to create it even if it enhances the performance of particular queries.
Unfortunately, it is not possible to predict join index creation times with any degree of accuracy because there are too many significant variables involved to be able to produce a useful predictive model. This multivariate complexity also prevents the derivation of a useful predictive model for maintenance overhead or even for query time reductions.
Example of Join Index Creation
Consider the following example. Empirical testing indicates that one of the simplest relationships is that between the time required to create an in-place aggregate join index and the number of rows in the base tables. Even this simple relationship is complicated by the fact that the resulting creation time is likely to be more closely related to the number of rows that qualify for the join rather than the total number of rows.
The following table indicates a measured relationship between the elapsed creation time for a join index and the number of qualified rows.
|Elapsed Time (seconds)||Number of Qualified Rows x 106|
These numbers, along with the total number of rows in each test and a least squares linear regression of number of qualifying rows on elapsed time, are graphed in the following figure:
Creation and Elapsed Query Times for Different Join Indexes
The following tables list the creation and elapsed query times for several different kinds of join indexes.
Note that when the definition of a join index can be covered by an existing join index, the existing index can be used to create the join index.
In the following table, when calculating the values for the Query Ratio and Benefit columns, query times of less than 1 second were rounded up to 1.0 seconds.
|Approximate Creation Costs and Query Benefits in Terms of Elapsed Query Time|
|Join Index Type||Creation Cost (seconds)||Elapsed Query Time (seconds)||Query Ratio (seconds)||Benefit (seconds)||Benefit Percent|
|Without Join Index||With Join Index|
|Single-table aggregate||51||41||< 1||41.00||40||98|
|2-table in-place aggregate||135||107||< 1||107.00||106||99|
|4-table in-place aggregate||279||243||< 1||243.00||242||100|
|2-table in-place outer join||231||147||98||1.50||49||33|
|2-table foreign aggregate||232||230||< 1||230.00||229||100|
|2-table ad hoc aggregate||397||357||< 1||357.00||356||100|
In the following table, the figure <<1 means the CPU path per row time was too short to measure. When calculating the values for the Benefit column, query times of less than 1 second were rounded up to 1.0 seconds. The Benefit value is based on elapsed query time rather than CPU path per row time.
|Creation Costs and Query Benefits in Terms of CPU Path Per Row|
|Join Index Type||Creation Cost (μseconds)||Query Time CPU Path Per Row (μseconds)||Benefit||Benefit Percentage|
|Without Join Index||With Join Index|
|Single-table aggregate||9||7||<< 1||40||98|
|2-table in-place aggregate||7||6||<< 1||106||99|
|4-table in-place aggregate||10||8||<< 1||242||100|
|2-table in-place outer join||16||11||7||49||33|
|2-table foreign aggregate||19||19||<< 1||229||100|
|2-table ad hoc aggregate||34||30||<< 1||356||100|
You can use these figures, which are derived from tests on a two node system, to scale an approximately linear adjustment for your own table sizes and configuration.
Cost of Join Index Maintenance
The largest resource burden for any join index is incurred by its maintenance: inserting, updating, and deleting rows in the join index table. Similar to creation costs, various types of join indexes incur very different maintenance costs.
Maintenance Costs of Join Indexes
Join indexes can be expensive to maintain. For many customers, the most important factor in the decision to use a join index is likely to be how much it costs to maintain.
Each time a join-indexed base table column is updated, the corresponding join index table column must also be updated. Each time a new row is added to or an existing row is deleted from a join-indexed base table, the corresponding join index table rows must also be inserted or deleted.
Because of this maintenance overhead, you should always carefully evaluate the benefit of a join index created to enhance your standard queries vis-a-vis its cost to create and maintain (see Cost/Benefit Analysis for Join Indexes).
Maintenance Cost Optimizations Based on Foreign Key-Primary Key Joins
You should consider the added cost of join index maintenance carefully when you are designing the indexes for your data warehouse to ensure that the minimum number of join indexes can be called upon by the Optimizer to cover the maximum number of queries. Designing with foreign key-primary key joins allows you to make these optimizations.
Whenever a base table column set that is shared with a join index is updated or deleted, or when a new row is inserted into the base table, the system generates extra steps to maintain the base table and join index concurrently. If the base table is specified as part of an outer join in the join index definition, the steps can be more complex because maintenance might be needed for both matched and unmatched row sets.
However, when the join columns have a foreign key-primary key relationship, the system treats inner and outer joins alike (see Restriction on Coverage by Join Indexes When a Join Index Definition References More Tables Than a Query).
Maintenance Cost Optimizations for DELETE ALL Operations
A fastpath optimization is one that can be performed faster if certain conditions are met. For example, in some circumstances DELETE and INSERT operations can be performed faster if they can avoid reading the data blocks and avoid transient journaling.
Teradata Database uses both fastpath and deferred fastpath row partition DELETE operations for the following cases:
- Deferred row partition deletion on a row-partitioned base table when a join index defined on the base table is not row-partitioned
- Deferred partition deletion on a row-partitioned join index that is defined on a table
- Deferred partition deletion on both a row-partitioned join index and its row-partitioned base table
In this case, Teradata Database performs the fastpath row partition deletion operations on the join index and the base table independently.
Teradata Database can perform fastpath DELETE ALL operations, but not deferred row partition deletion operations, for the following cases:
- If the deleted table has a a conditional DELETE with predicates and it covers the entire join index, the join index is eligible for a fast path DELETE.
- All single-table join indexes.
- A multitable join index when the join between the tables is either an inner join or the table being deleted is the outer table in the join.
- An implicit transaction with a single-statement DELETE ALL table_name when the table has a join index defined on it.
- An implicit transaction with a multistatement request.
- An ANSI/ISO session mode transaction with a multistatement request.
- A Teradata session mode transaction with a multistatement request.
Types of Join Index Examined for This Analysis
Different types of join indexes incur different costs of maintenance. For the analyses provided here, several different types of join index were used. The types are far from exhaustive, but they provide a fairly representative range of data that you can use to extrapolate roughly how much benefit a join index that uses a particular type of join is likely to provide.
The following table provides a list of the various types of joins defined for the join indexes used in this study. When you think of types of joins, you probably think of join processing types like merge join, nested join, product join, and so on.
The types of joins defined here (in-place, foreign, and ad hoc) are unrelated to those join processing types. The types of joins and the types of join indexes go hand in hand. In both cases, it is the number of tables that are being redistributed that determines the type of join or join index being described.
Each join type can be used with either a simple or an aggregate join index.
|In-place||The joined tables have a common primary index and are joined on that column set.
A common example is a logical entity-subentity relationship as seen in an Employee - Employee_Phone relationship, where both tables have Employee Number as a common primary index.
As a result, rows to be joined are always on a common AMP and do not have to be redistributed.
This is the least expensive join of the three types examined.
|Foreign||The tables are joined on a primary key - foreign key column set relationship, where the primary key column set is also the primary index column set. The primary index of the other join table is typically a foreign key in the first table.
A common example is a join between an Employee table and a Department table, where the join is on the common Employee Number column, which is the primary index for the Employee table, but is only a foreign key in the Department table.
As a result, the rows to be joined must be redistributed from the AMP having the Department table rows to the AMP having the Employee table rows.
This join is more expensive than the homogeneous primary index join, but less expensive than the ad hoc join.
|Ad hoc||The tables are joined on a column set that is not a primary index in either table.
As a result, rows from both tables must be redistributed to make the join.
This join is the most expensive of the three types examined.