17.10 - Maintenance Cost as Function of Hits for Each Data Block - Advanced SQL Engine - Teradata Database

Teradata Vantage™ - Database Design

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Release Date
July 2021
Content Type
User Guide
Publication ID
B035-1094-171K
Language
English (United States)

The number of hits per block is a measure of how many rows in a data block are accessed (inserted, deleted, or updated) during an operation on a table. Generally speaking, the greater the number of hits per block, the better the performance provided the hits can be combined into one update of the data block, as would be the case, for example, with an INSERT ... SELECT involving a set of updates containing multiple rows.

If a large number of the data blocks for a table have become significantly smaller than half the maximum size for the defined maximum data block size, an appropriate specification for the MERGEBLOCKRATIO option of the ALTER TABLE and CREATE TABLE statements can enable the file system to merge up to 8 data blocks into a single larger data block. For more information, see Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.

On the other hand, if each hit of a data block involves a separate read operation, with hits occurring semi-randomly, performance is worse.

Data Processing Scenarios for the Hit Rates Studied

The following table provides example data processing scenarios that correspond with the hit rates studied.

Number of Hits per Data Block Percentage of Rows Touched Scenario
             1            0.2 Update a table that contains 500 days of data (roughly 1.5 years) with one typical day of data.
             5            1.0 Insert into a table that contains 104 weeks of data (2 years) one typical week of data.
           20            4.0 Delete from a table that contains 104 weeks of data (2 years) one typical month (4 weeks) of data.

Standard Test Procedure

The standard procedure for these tests was to make any maintenance changes to the left table in the join.

Maintenance Costs In Terms of Elapsed Time

Elapsed times increase as a function of increased hits per data block because more rows are touched. At the same time, the CPU path per row times decrease because the elapsed times increase at a lesser rate than the increase in the number of rows touched.

Maintenance Costs In Terms of CPU Path Length Per Transaction

Suppose you report the same information in terms of CPU path length per transaction. For this data, the term transaction equates to qualifying row. This number is a measure of the amount of CPU time a transaction requires; roughly, the number of instructions performed per transaction.

CPU path per transaction is a better way to compare various manipulations than elapsed time for two reasons:
  • CPU path per transaction figures include both CPU utilization information and elapsed time.
  • Because the measure is normalized, it factors out configuration-specific variables such as number of CPUs per node and number of nodes, making the numbers comparable across all configurations.