Join Index Maintenance Cost as Function of Hits for Each Data Block - Teradata VantageCloud Lake

Lake - Database Reference

Deployment
VantageCloud
Edition
Lake
Product
Teradata VantageCloud Lake
Release Number
Published
February 2025
ft:locale
en-US
ft:lastEdition
2025-11-21
dita:mapPath
ohi1683672393549.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
ohi1683672393549

The number of hits per block is the number of data block rows accessed (inserted, deleted, or updated) during a table operation. Typically, the more hits per block, the better the performance, if the hits can be combined into one data block update (for example, with an INSERT ... SELECT for a set of updates containing multiple rows).

If a large number of the data blocks for a table have become significantly smaller than half the maximum size for the defined maximum data block size, an appropriate specification for the MERGEBLOCKRATIO option of the ALTER TABLE and CREATE TABLE statements can enable the file system to merge up to 8 data blocks into a single larger data block. For more information, see ALTER TABLE and CREATE TABLE Syntax.

For Object File System tables, you cannot change parameters related to data block size.

However, if each data block hit involves a separate read operation, with hits occurring semi-randomly, performance is worse.

Data Processing Scenarios for the Hit Rates Studied

The following table provides data processing scenarios that correspond with the hit rates studied.

Number of Hits per Data Block Percentage of Rows Touched Scenario
1 0.2 Update a table that contains 500 days of data (roughly 1.5 years) with one typical day of data.
5 1.0 Insert into a table that contains 104 weeks of data (2 years) one typical week of data.
20 4.0 Delete from a table that contains 104 weeks of data (2 years) one typical month (4 weeks) of data.

Standard Test Procedure

The standard procedure for these tests was to make any maintenance changes to the left table in the join.

Maintenance Costs in Terms of Elapsed Time

Elapsed times increase as a function of increased hits per data block because more rows are touched. At the same time, the CPU path per row times decrease because the elapsed times increase at a lesser rate than the increase in the number of rows touched.

Maintenance Costs in Terms of CPU Path Length Per Transaction

Suppose you report the same information in terms of CPU path length per transaction. For this data, the term transaction equates to qualifying row. This number is a measure of the amount of CPU time a transaction requires; roughly, the number of instructions performed per transaction.

CPU path per transaction is a better way to compare manipulations than elapsed time for two reasons:
  • CPU path per transaction figures include both CPU usage information and elapsed time.
  • The measure is normalized, and therefore factors out configuration-specific variables such as number of CPUs per node and number of nodes, making the numbers comparable across all configurations.