For myriad reasons, some methods of inserting rows into tables are much more efficient than others. Four different methods of insertion are examined here for their efficiency in lowering the maintenance cost of join indexes. The following methods are reported.
- Case 1: FastLoad and INSERT … SELECT the rows
- Case 2: Drop the join index, insert the rows into the base table, recreate the join index
- Case 3: Teradata Parallel Data Pump the rows into the base table
- Case 4: Insert the rows into the base table using SQL
Case 1: INSERT … SELECT
This method uses the following procedure.
- FastLoad the rows into an empty table.
- INSERT … SELECT the rows from the freshly loaded table into the table that has a join index defined.
This procedure uses block-at-a-time optimization and is the fastest of the methods examined.
Case 2: Dropping a Join Index and Recreating It After Inserting Rows Into Its Base Table
This method uses the following procedure:
- Drop the join index.
- Insert the rows into the base table using any method.
- Recreate the join index.
The elapsed time for this method averages to about 1.3 times the elapsed time measured for the method provided in Case 1.
Case 3: Teradata Parallel Data Pump
This method uses the Teradata Parallel Data Pump utility to insert rows into the base table with a join index. The Teradata Parallel Data Pump utility has several procedural advantages that make it an attractive option, including restartability and resource throttleability.
Because the Teradata Parallel Data Pump utility performs row-at-a-time operations, however, its performance for this application is not optimal. Measurements confirm that the Teradata Parallel Data Pump utility, when performed using an non-optimized run, is more than 200 times slower than the method of Case 1 with respect to elapsed time measurements. Optimization of the run using techniques like sorting input data in different orders can reduce this figure by an order of magnitude in many circumstances, bringing the cost differential down to a more reasonable 20:1 ratio for a 1 hit per data block situation. As the number of hits per data block increases, the cost differential increases because methods that use block-at-a-time methods become increasingly efficient.
Case 4: Single Row INSERT Request
This method uses the SQL INSERT statement to insert the rows into the base table. The data presented in Maintenance Cost as a Function of Number of Hits Per Data Block confirms that this method is slightly better than the method presented in Case 2.
Comparative Elapsed Times to Insert
The following table indicates the comparative elapsed times required to insert the same number of 100 byte rows into a base table and join index table.
|Insertion Method||Operation Performed||Elapsed Time (seconds)|
|Case 1:||FastLoad into empty table
INSERT … SELECT into base table with join index
|Case 2:||Drop join index
INSERT … SELECT into base table without join index
Recreate join index
|Case 3:||Use the Teradata Parallel Data Pump utility to load rows into a base table with a join index||25,896|
The following table presents a summary evaluation of the insert methods examined in this topic.
|Case 1:||Greatest speed|
|Case 2:||Least advantageous method|
Generalizations Derived From These Tests
The following generalizations are made from these test results:
- Maintenance costs for aggregate join indexes are much lower than maintenance costs for comparable simple join indexes.
- Maintenance costs for no join index are lower than maintenance costs for foreign and ad hoc join indexes.
For example, maintenance costs for a 2-table in-place aggregate join index are 1.1 to 2.5 times greater than maintenance costs computed for just the base table without a join index.
- Maintenance costs vary with the type of join index defined.
For example, with an in-place aggregate join index, the higher the number of hits per data block, the less the overhead incurred. This effect is marginal for inserts, for which maintenance costs are already minimal.
On the other hand, for an in-place simple join index at 1 hit per data block, deletes cost 3.4 times more and updates cost 4.5 times more than the case where no join index is defined.
In contrast to the in-place aggregate join index, maintenance costs worsen as the number of hits per data block increase, as indicated by the following graph of CPU path per transaction as a function of number of hits per data block.
In the case of an in-place simple join index, inserts cost less than 1.4 times the maintenance required when no join index is defined.
In all cases, inserts are never more than four times more expensive than the maintenance cost when no join index is defined.