Performance Effects of Skewed Row Distribution | Teradata Vantage - Performance Effects of Skewed Row Distribution - Analytics Database - Teradata Vantage

Database Administration

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Teradata Vantage
Release Number
17.20
Published
June 2022
ft:locale
en-US
ft:lastEdition
2024-10-04
dita:mapPath
pgf1628096104492.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
ujp1472240543947
lifecycle
latest
Product Category
Teradata Vantageā„¢

Uneven distribution of table rows among AMPs (skew) can prevent efficient query processing.

Skewed distribution results in:

  • Poor CPU parallel efficiency on full table scans and bulk inserts
  • Increased I/O for updates and inserts of over-represented values

The effects of a skewed table appear in several types of operations. For example:

  • In full table scans, the AMPs with fewer rows of the target table must wait for the AMPs with disproportionately high numbers of rows to finish. Node CPU utilization reflects these differences because a node is only as fast as the slowest AMP of that node.
  • In the case of bulk inserts to a skewed table, consider the extra burden placed on an AMP with a high number of multiple rows for the same NUPI value.

    For example, assume you have a 5 million row table, with 5,000 rows having the same NUPI value. You are inserting 100,000 rows into that table, with 100 of those insert rows having the same NUPI value. The AMP holding the 5,000 rows with that NUPI value has to perform one half million duplicate row checks (5,000 * 100) for this NUPI. This operation results in poor parallel efficiency.