Skewed Query Processing Across AMPs | Teradata Vantage - Skewed Query Processing Across AMPs - Analytics Database - Teradata Vantage

Database Administration

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Teradata Vantage
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-10-04
dita:mapPath
pgf1628096104492.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
ujp1472240543947
lifecycle
latest
Product Category
Teradata Vantage™

AMP vprocs always run in parallel, but the way data rows or column values are striped across the disks affect the parallel operation of AMP step processing.

Unbalanced, or skewed, or spiked, disk loads can cause one or a few AMPs to be doing most of the I/Os. For example, when a numeric column allows zeros and/or nulls, the majority of rows might hash to the same AMP.

If your disk loads are poorly balanced, discuss with operations ways to correct the situation. For example:

  • Perhaps queries or views against a column with zeros/nulls could use “WHERE NOT NULL” or “NOT= 0” qualifiers.
  • If the cause is a NUPI, consider redefining the index, especially for a very large table, to achieve a higher percentage of uniqueness.

Common Skew Issues

Issue Cause Results
Same row hash value for an excessive number of rows
  • Rows with same row hash value cannot fit in a single data block.
  • Rows spill over into additional data blocks.
The table is skewed due to a highly nonunique primary index (PI).

As an estimate, more than 1000 occurrences/NUPI value begin to cause performance degradation problems. This figure is based on all the rows for the same NUPI value spilling over into more than five data blocks.

See Identifying and Fixing Skewed Tables.

  • Increased I/Os for updates.
  • Increased compares for inserts and FastLoad (more Central Processing Unit (CPU) and I/Os).
  • Performance degradation in the Restore and Table Rebuild utilities.
The data block size is set too low.

See Managing Data Block Usage.

Data block size should not be a problem on systems using block-level compression because the size should be set to the maximum.
Some AMPs have many more rows of a table than do other AMPs.
  • One or a few NUPI values have many more rows than all the other NUPI values.
  • A highly skewed join field in a join or aggregation, for example a join on a city code in a customer table where a large number of customers may be located in a few cities.
  • A highly skewed referencing column (for instance, a city code) when using referential integrity (RI).
  • Poor CPU parallel efficiency on the AMPs during full table scans and bulk inserts.
  • Increased I/O for updates, and increased compares for inserts (also more I/O).