SQL Scripts for Detecting Skew - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905
Different measures of skew can be made.
  • Rows per value
  • Rows per hash bucket
  • Rows per AMP
  • Rows per row partition
  • Rows per row partition per AMP
  • Rows per row partition per hash bucket
  • Rows per row partition per value

Each type of skew can have a different effect on the query plan the Optimizer chooses, as does the concentration of the relevant rows within the data blocks.

If you are analyzing the demographics of an existing table, whether production or prototype, you can use the following set of useful scripts written by the Teradata technical support team to check for data skew. You can adjust the details of these statement to suit the needs of your site.

Run these queries when new applications are being loaded on the system and after significant data changes.

The following query identifies tables that are not evenly distributed. Ideal variance is less than 5%. In this query, variance is set to 1000%, which indicates AMPs with no rows from the table in question. You can use the BTEQ command RETLIMIT to limit the number of rows returned.

SELECT (MAX(CurrentPerm) - MIN(CurrentPerm)) * 100
        /(NULLIF(MIN(currentperm),0))(NAMED variance)
       (FORMAT 'zzzzz9.99%'),MAX(CurrentPerm)(TITLE 'Max')
       (FORMAT 'zzz,zzz,zzz,999'),MIN(currentperm)
       (TITLE 'Min')(FORMAT 'zzz,zzz,zzz,999'),
        TRIM(DatabaseName)||'.'||TableName (NAMED Tables)
FROM DBC.TablesizeV
GROUP BY DatabaseName, TableName
HAVING SUM(CurrentPerm) > 1000000
AND    variance > 1000
WHERE DatabaseName NOT IN('CrashDumps','DBC')
ORDER BY Tables;

Use the following query to display the detailed distribution of a table that has been identified as having a skewed distribution:

SELECT vproc, CurrentPerm
FROM DBC.TablesizeV
WHERE DatabaseName = 'databasename'
AND   TableName = 'tablename'
ORDER BY 1;

The following query reports the row distribution of a table by AMP:

SELECT dt1.a (TITLE 'AMP'), dt1.b (TITLE 'Rows'),
      ((dt1.b/dt2.x (FLOAT)) - 1.0)*100 (FORMAT '+++9%',
      TITLE ’Deviation’)
FROM (SELECT HASHAMP(HASHBUCKET(HASHROW(index))),COUNT(*)
      FROM databasename.tablename
      GROUP BY 1) AS dt1 (a,b),
     (SELECT (COUNT(*) / (HASHAMP()+1)(FLOAT))
      FROM databasename.tablename) AS dt2(x)
ORDER BY 2 DESC,1;

The following query reports the distribution by AMP of the specified index or column.

SELECT HASHAMP(HASHBUCKET(HASHROW({index | column}))) ,COUNT(*)
FROM databasename.tablename
GROUP BY 1
ORDER BY 2 DESC;

The following query reports the number of row hash collisions for the specified index or column.

SELECT HASHROW({index | column}), COUNT(*)
FROM databasename.tablename
GROUP BY 1
ORDER BY 1
HAVING COUNT(*) > 10;

The following query reports the number of AMPs and the number of rows a given query accesses.

LOCKING TABLE tablename FOR ACCESS
SELECT COUNT(dt.ampNum)(TITLE '#AMPS'),
       SUM(dt.numRows)(TITLE '#ROWS')
FROM (SELECT HASHAMP(HASHBUCKET(HASHROW(<index>))), COUNT(*)
      FROM tablename
      WHERE selection_criteria
      GROUP BY 1) AS dt (ampNum, numRows);