15.00 - Cost of Join Index Creation - Teradata Database

Teradata Database Design

prodname
Teradata Database
vrm_release
15.00
category
User Guide
featnum
B035-1094-015K

Cost of Join Index Creation

The cost of creating a join index is described in terms of time: how long does it take to create the index? Equally important in many situations is the question of how many times a query for which a join index was designed must be performed to offset the creation time by recovering response time.

Different types of join index require a longer or shorter time to create.

If your processing environment requires that join indexes be created and dropped on a regular basis, the cost of creating the index becomes a critical factor that must be considered carefully in determining its worth.

Because of this creation overhead, you should evaluate the benefit of a join index created to enhance your standard queries vis‑a-vis its creation cost if you must drop and create your join indexes frequently. If the gross effect of an index is negative with respect to overall system performance, you might not want to create it even if it enhances the performance of particular queries.

Unfortunately, it is not possible to predict join index creation times with any degree of accuracy because there are too many significant variables involved to be able to produce a useful predictive model. This multivariate complexity also prevents the derivation of a useful predictive model for maintenance overhead or even for query time reductions.

Example

Consider the following example. Empirical testing indicates that one of the most simple relationships is that between the time required to create an in-place aggregate join index and the number of rows in the base tables. Even this simple relationship is complicated by the fact that the resulting creation time is likely to be more closely related to the number of rows that qualify for the join rather than the total number of rows.

The following table indicates a measured relationship between the elapsed creation time for a join index and the number of qualified rows.

 

Elapsed Time (seconds)

Number of Qualified Rows x 106

                                         133

                                            90

                                         415

                                          300

                                      2,344

                                          900

These numbers, along with the total number of rows in each test and a least squares linear regression of number of qualifying rows on elapsed time, are graphed in the following figure:

The equation for this regression is as follows:

This translates into physical terms as follows:

where the value for Create time is expressed in elapsed seconds and the order of magnitude for Number of qualified rows is expressed in millions of rows.

The computed coefficient of determination for this regression, R2, is 0.9819. This means that 98% of the variance in the sample data is accounted for by this equation, for which the correlation coefficient is 0.991. In other words, the line is nearly a perfect fit to the data.

A sanity check of the data reveals that this regression might not always be as accurate as it seems to be upon initial scrutiny. For example, consider the 300 million row case. The measured elapsed time value is 415 seconds, but the predicted value is 595.23 seconds, a difference of just over 180 seconds, which is an error of 43%.

There are several possible explanations for this discrepancy, but the message to be taken away from this example is that while the model performs well in predicting the general direction of how much time it should take to create a join index with a given number of qualifying rows, the absolute accuracy of the prediction might vary considerably at some points on the curve.

Creation and Elapsed Query Times for Different Join Indexes

The following tables list the creation and elapsed query times for several different kinds of join indexes.

Note that when the definition of a join index can be covered by an existing join index, the existing index can be used to create the join index.

In the following table, when calculating the values for the Query Ratio and Benefit columns, query times of less than 1 second were rounded up to 1.0 seconds.

 

Approximate Creation Costs and Query Benefits in Terms of Elapsed Query Time

Join Index Type

 

 

 

Creation Cost (seconds)

 

Elapsed Query Time (seconds)

Query Ratio (seconds)

 

Benefit (seconds)

 

Benefit Percent

 

Without Join Index

With
Join Index

Single-table aggregate

         51

        41

    < 1

      41.00

       40

       98

2-table in-place aggregate

       135

      107

    < 1

    107.00

     106

       99

4-table in-place aggregate

       279

      243

    < 1

    243.00

     242

     100

Single-table simple

       199

        80

      63

        1.27

       17

       21

2-table in-place

       190

      116

      78

        1.49

       38

       33

2-table in-place outer join

       231

      147

      98

        1.50

       49

       33

2-table foreign aggregate

       232

      230

    < 1

    230.00

     229

     100

2-table ad hoc aggregate

       397

      357

    < 1

    357.00

     356

     100

In the following table, the figure <<1 means the CPU path per row time was too short to measure. When calculating the values for the Benefit column, query times of less than 1 second were rounded up to 1.0 seconds. The Benefit value is based on elapsed query time rather than CPU path per row time.

 

Creation Costs and Query Benefits in Terms of CPU Path Per Row

Join Index Type

Creation Cost (μseconds)

 

Query Time CPU Path Per Row (μseconds)

 

Benefit

 

 

 

Benefit Percentage

 

 

 

Without Join Index

With Join Index

Single-table aggregate

           9

         7

   << 1

       40

        98

2-table in-place aggregate

           7

         6

   << 1

     106

        99

4-table in-place aggregate

         10

         8

   << 1

     242

      100

Single-table simple

         33

       14

       11

       17

        21

2-table in-place

         14

       10

         7

       38

        33

2-table in-place outer join

         16

       11

         7

       49

        33

2-table foreign aggregate

         19

       19

   << 1

     229

      100

2-table ad hoc aggregate

         34

       30

   << 1

     356

      100

You can use these figures, which are derived from tests on a two node system, to scale an approximately linear adjustment for your own table sizes and configuration.