Collecting Statistics on Multiple Columns - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

If you frequently perform queries with search conditions specified on multiple columns, collect statistics on those columns jointly. Statistics collected in this manner permit the Optimizer to more accurately estimate the number of qualifying rows for queries that specify both of those columns. You cannot collect multicolumn statistics on global temporary tables.

The statistics derived for column_1 and column_2 when collected jointly are different from those derived by collecting statistics separately on those same columns. For example, suppose you have two columns named FirstName and LastName. Specifically, the statistics collected on FirstName by itself provide an estimate of the number of individuals having the first name John, for example, and the statistics collected on LastName provide an estimate of the number of individuals having the last name Smith.

These statistics may not provide good estimates of the number of individuals named John Smith. For that, you need to collect statistics on the two columns jointly. This example is provided only to make a clear distinction between collecting statistics on individual columns and collecting statistics on those same columns jointly.

The following table provides a general guideline for determining how to collect statistics on multiple columns.

Set of Unindexed Columns Columns on Which to Collect Statistics
Frequently specified together as an equality condition or WHERE clause join condition Column set jointly
Infrequently specified together as an equality condition or WHERE clause join condition Individual columns separately

Vantage treats multicolumn statistics the same as index statistics if an index has been defined on the same column set.

For example, if you have defined an index on (x1, y1) in table t1, then the following two requests collect the identical statistics and store those statistics in DBC.StatsTbl.

COLLECT STATISTICS ON t1 COLUMN (x1,y1);

COLLECT STATISTICS ON t1 INDEX (x1,y1);

When collecting multicolumn statistics, exclude columns that have null values and appear infrequently in predicates.

For example, suppose you are considering collecting multicolumn statistics on columns 1, 2, and 3, and their values are like the following, where a QUESTION MARK character represents a null.

Column 1 Column 2 Column 3
1 2 3
? 2 3
1 ? 3
1 2 ?
? ? ?
1 2 ?
1 2 ?
1 2 ?
1 2 ?
1 2 ?

Teradata recommends not collecting multicolumn statistics that include column 3 unless column 3 appears frequently in predicates.