Performance Effects of Skewed Row Distribution | Teradata Vantage - Performance Effects of Skewed Row Distribution - Advanced SQL Engine - Teradata Database

Database Administration

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Published
July 2021
Language
English (United States)
Last Update
2021-07-27
dita:mapPath
upb1600054424724.ditamap
dita:ditavalPath
upb1600054424724.ditaval
dita:id
B035-1093
lifecycle
previous
Product Category
Teradata Vantageā„¢

Uneven distribution of table rows among AMPs (skew) can prevent efficient query processing.

Skewed distribution results in:

  • Poor CPU parallel efficiency on full table scans and bulk inserts
  • Increased I/O for updates and inserts of over-represented values

The effects of a skewed table appear in several types of operations. For example:

  • In full table scans, the AMPs with fewer rows of the target table must wait for the AMPs with disproportionately high numbers of rows to finish. Node CPU utilization reflects these differences because a node is only as fast as the slowest AMP of that node.
  • In the case of bulk inserts to a skewed table, consider the extra burden placed on an AMP with a high number of multiple rows for the same NUPI value.

    For example, assume you have a 5 million row table, with 5,000 rows having the same NUPI value. You are inserting 100,000 rows into that table, with 100 of those insert rows having the same NUPI value. The AMP holding the 5,000 rows with that NUPI value has to perform one half million duplicate row checks (5,000 * 100) for this NUPI. This operation results in poor parallel efficiency.