Skewed Query Processing Across AMPs | Teradata Vantage - Skewed Query Processing Across AMPs - Advanced SQL Engine

Skewed Query Processing Across AMPs | Teradata Vantage - Skewed Query Processing Across AMPs - Advanced SQL Engine - Teradata Database

Database Administration

Product

Advanced SQL Engine

Teradata Database

Release Number

17.05

17.00

Published

June 2020

Language

English (United States)

Last Update

2021-01-22

dita:mapPath

rgu1556127906220.ditamap

dita:ditavalPath

lze1555437562152.ditaval

dita:id

B035-1093

lifecycle

Product Category

Teradata Vantage™

AMP vprocs always run in parallel, but the way data rows or column values are striped across the disks affect the parallel operation of AMP step processing.

Unbalanced, or skewed, or spiked, disk loads can cause one or a few AMPs to be doing most of the I/Os. For example, when a numeric column allows zeros and/or nulls, the majority of rows might hash to the same AMP.

If your disk loads are poorly balanced, discuss with operations ways to correct the situation. For example:

Perhaps queries or views against a column with zeros/nulls could use “WHERE NOT NULL” or “NOT= 0” qualifiers.
If the cause is a NUPI, consider redefining the index, especially for a very large table, to achieve a higher percentage of uniqueness.

Common Skew Issues

Issue	Cause	Results
Same row hash value for an excessive number of rows Rows with same row hash value cannot fit in a single data block. Rows spill over into additional data blocks.	The table is skewed due to a highly nonunique primary index (PI). As an estimate, more than 1000 occurrences/NUPI value begin to cause performance degradation problems. This figure is based on all the rows for the same NUPI value spilling over into more than five data blocks. See Identifying and Fixing Skewed Tables.	Increased I/Os for updates. Increased compares for inserts and FastLoad (more Central Processing Unit (CPU) and I/Os). Performance degradation in the Restore and Table Rebuild utilities.
The data block size is set too low. See Managing Data Block Usage. Data block size should not be a problem on systems using block-level compression because the size should be set to the maximum.
Some AMPs have many more rows of a table than do other AMPs.	One or a few NUPI values have many more rows than all the other NUPI values. A highly skewed join field in a join or aggregation, for example a join on a city code in a customer table where a large number of customers may be located in a few cities. A highly skewed referencing column (for instance, a city code) when using referential integrity (RI).	Poor CPU parallel efficiency on the AMPs during full table scans and bulk inserts. Increased I/O for updates, and increased compares for inserts (also more I/O).

Issue

Cause

Results

Same row hash value for an excessive number of rows

Rows with same row hash value cannot fit in a single data block.
Rows spill over into additional data blocks.

The table is skewed due to a highly nonunique primary index (PI).

As an estimate, more than 1000 occurrences/NUPI value begin to cause performance degradation problems. This figure is based on all the rows for the same NUPI value spilling over into more than five data blocks.

See Identifying and Fixing Skewed Tables.

Increased I/Os for updates.
Increased compares for inserts and FastLoad (more Central Processing Unit (CPU) and I/Os).
Performance degradation in the Restore and Table Rebuild utilities.

The data block size is set too low.

See Managing Data Block Usage.

Data block size should not be a problem on systems using block-level compression because the size should be set to the maximum.

Some AMPs have many more rows of a table than do other AMPs.

One or a few NUPI values have many more rows than all the other NUPI values.
A highly skewed join field in a join or aggregation, for example a join on a city code in a customer table where a large number of customers may be located in a few cities.
A highly skewed referencing column (for instance, a city code) when using referential integrity (RI).

Poor CPU parallel efficiency on the AMPs during full table scans and bulk inserts.
Increased I/O for updates, and increased compares for inserts (also more I/O).