Uneven distribution of table rows among AMPs (skew) can prevent efficient query processing.
Skewed distribution results in:
- Poor CPU parallel efficiency on full table scans and bulk inserts
- Increased I/O for updates and inserts of over-represented values
The effects of a skewed table appear in several types of operations. For example:
- In full table scans, the AMPs with fewer rows of the target table must wait for the AMPs with disproportionately high numbers of rows to finish. Node CPU utilization reflects these differences because a node is only as fast as the slowest AMP of that node.
- In the case of bulk inserts to a skewed table, consider the extra burden placed on an AMP with a high number of multiple rows for the same NUPI value.
For example, assume you have a 5 million row table, with 5,000 rows having the same NUPI value. You are inserting 100,000 rows into that table, with 100 of those insert rows having the same NUPI value. The AMP holding the 5,000 rows with that NUPI value has to perform one half million duplicate row checks (5,000 * 100) for this NUPI. This operation results in poor parallel efficiency.