Performance Effects of Skewed Row Distribution | Teradata Vantage - Performance Effects of Skewed Row Distribution - Advanced SQL Engine

Performance Effects of Skewed Row Distribution | Teradata Vantage - Performance Effects of Skewed Row Distribution - Advanced SQL Engine - Teradata Database

Database Administration

Product

Advanced SQL Engine

Teradata Database

Release Number

17.10

Published

July 2021

Language

English (United States)

Last Update

2021-07-27

dita:mapPath

upb1600054424724.ditamap

dita:ditavalPath

upb1600054424724.ditaval

dita:id

B035-1093

lifecycle

Product Category

Teradata Vantage™

Uneven distribution of table rows among AMPs (skew) can prevent efficient query processing.

Skewed distribution results in:

Poor CPU parallel efficiency on full table scans and bulk inserts
Increased I/O for updates and inserts of over-represented values

The effects of a skewed table appear in several types of operations. For example:

In full table scans, the AMPs with fewer rows of the target table must wait for the AMPs with disproportionately high numbers of rows to finish. Node CPU utilization reflects these differences because a node is only as fast as the slowest AMP of that node.
In the case of bulk inserts to a skewed table, consider the extra burden placed on an AMP with a high number of multiple rows for the same NUPI value.
For example, assume you have a 5 million row table, with 5,000 rows having the same NUPI value. You are inserting 100,000 rows into that table, with 100 of those insert rows having the same NUPI value. The AMP holding the 5,000 rows with that NUPI value has to perform one half million duplicate row checks (5,000 * 100) for this NUPI. This operation results in poor parallel efficiency.