Performance Effects of Skewed Row Distribution - Teradata Database - Teradata Vantage NewSQL Engine

Teradata Vantage™ - Database Administration

Product
Teradata Database
Teradata Vantage NewSQL Engine
Release Number
16.20
Published
March 2019
Language
English (United States)
Last Update
2019-05-03
dita:mapPath
tgx1512080410608.ditamap
dita:ditavalPath
TD_DBS_16_20_Update1.ditaval
dita:id
ujp1472240543947
Product Category
Software
Teradata Vantage

Uneven distribution of table rows among AMPs (skew) can prevent efficient query processing.

Skewed distribution results in:

  • Poor CPU parallel efficiency on full table scans and bulk inserts
  • Increased I/O for updates and inserts of over-represented values

The effects of a skewed table appear in several types of operations. For example:

  • In full table scans, the AMPs with fewer rows of the target table must wait for the AMPs with disproportionately high numbers of rows to finish. Node CPU utilization reflects these differences because a node is only as fast as the slowest AMP of that node.
  • In the case of bulk inserts to a skewed table, consider the extra burden placed on an AMP with a high number of multiple rows for the same NUPI value.

    For example, assume you have a 5 million row table, with 5,000 rows having the same NUPI value. You are inserting 100,000 rows into that table, with 100 of those insert rows having the same NUPI value. The AMP holding the 5,000 rows with that NUPI value has to perform one half million duplicate row checks (5,000 * 100) for this NUPI. This operation results in poor parallel efficiency.