Join Index Maintenance Cost as a Function of Insert Method - Advanced SQL Engine

Join Index Maintenance Cost as a Function of Insert Method - Advanced SQL Engine - Teradata Database

Database Design

Product

Advanced SQL Engine

Teradata Database

Release Number

17.05

17.00

Published

June 2020

Language

English (United States)

Last Update

2021-01-22

dita:mapPath

qby1588121512748.ditamap

dita:ditavalPath

lze1555437562152.ditaval

dita:id

B035-1094

lifecycle

Product Category

Teradata Vantage™

For myriad reasons, some methods of inserting rows into tables are much more efficient than others. Four different methods of insertion are examined here for their efficiency in lowering the maintenance cost of join indexes. The following methods are reported.

Case 1: FastLoad and INSERT … SELECT the rows
Case 2: Drop the join index, insert the rows into the base table, recreate the join index
Case 3: Teradata Parallel Data Pump the rows into the base table
Case 4: Insert the rows into the base table using SQL

Case 1: INSERT … SELECT

This method uses the following procedure.

FastLoad the rows into an empty table.
INSERT … SELECT the rows from the freshly loaded table into the table that has a join index defined.

This procedure uses block-at-a-time optimization and is the fastest of the methods examined.

Case 2: Dropping a Join Index and Recreating It After Inserting Rows Into Its Base Table

This method uses the following procedure:

Drop the join index.
Insert the rows into the base table using any method.
Recreate the join index.

The elapsed time for this method averages to about 1.3 times the elapsed time measured for the method provided in Case 1.

Case 3: Teradata Parallel Data Pump

This method uses the Teradata Parallel Data Pump utility to insert rows into the base table with a join index. The Teradata Parallel Data Pump utility has several procedural advantages that make it an attractive option, including restartability and resource throttleability.

Because the Teradata Parallel Data Pump utility performs row-at-a-time operations, however, its performance for this application is not optimal. Measurements confirm that the Teradata Parallel Data Pump utility, when performed using an non-optimized run, is more than 200 times slower than the method of Case 1 with respect to elapsed time measurements. Optimization of the run using techniques like sorting input data in different orders can reduce this figure by an order of magnitude in many circumstances, bringing the cost differential down to a more reasonable 20:1 ratio for a 1 hit per data block situation. As the number of hits per data block increases, the cost differential increases because methods that use block-at-a-time methods become increasingly efficient.

Case 4: Single Row INSERT Request

This method uses the SQL INSERT statement to insert the rows into the base table. The data presented in Maintenance Cost as a Function of Number of Hits Per Data Block confirms that this method is slightly better than the method presented in Case 2.

Comparative Elapsed Times to Insert

The following table indicates the comparative elapsed times required to insert the same number of 100 byte rows into a base table and join index table.

Insertion Method	Operation Performed	Elapsed Time (seconds)
Case 1:	FastLoad into empty table INSERT … SELECT into base table with join index	26 99
Case 1:	Total	125
Case 2:	Drop join index INSERT … SELECT into base table without join index Recreate join index	0 59 104
Case 2:	Total	163
Case 3:	Use the Teradata Parallel Data Pump utility to load rows into a base table with a join index	25,896

Summary Evaluation

The following table presents a summary evaluation of the insert methods examined in this topic.

Insertion Method	Evaluation
Case 1:	Greatest speed
Case 2:	Least advantageous method
Case 3:	Relatively slow Offers other advantages
Case 4:	Simple to perform Greatest efficiency for a small number of rows

Generalizations Derived From These Tests

The following generalizations are made from these test results:

Maintenance costs for aggregate join indexes are much lower than maintenance costs for comparable simple join indexes.
Maintenance costs for no join index are lower than maintenance costs for foreign and ad hoc join indexes.
For example, maintenance costs for a 2-table in-place aggregate join index are 1.1 to 2.5 times greater than maintenance costs computed for just the base table without a join index.
Maintenance costs vary with the type of join index defined.
For example, with an in-place aggregate join index, the higher the number of hits per data block, the less the overhead incurred. This effect is marginal for inserts, for which maintenance costs are already minimal.

On the other hand, for an in-place simple join index at 1 hit per data block, deletes cost 3.4 times more and updates cost 4.5 times more than the case where no join index is defined.

In contrast to the in-place aggregate join index, maintenance costs worsen as the number of hits per data block increase, as indicated by the following graph of CPU path per transaction as a function of number of hits per data block.

In the case of an in-place simple join index, inserts cost less than 1.4 times the maintenance required when no join index is defined.

In all cases, inserts are never more than four times more expensive than the maintenance cost when no join index is defined.