If you frequently perform queries with search conditions specified on multiple columns, collect statistics on those columns jointly. Statistics collected in this manner permit the Optimizer to more accurately estimate the number of qualifying rows for queries that specify both of those columns. You cannot collect multicolumn statistics on global temporary tables.
The statistics derived for column_1 and column_2 when collected jointly are different from those derived by collecting statistics separately on those same columns. For example, suppose you have two columns named FirstName and LastName. Specifically, the statistics collected on FirstName by itself provide an estimate of the number of individuals having the first name John, for example, and the statistics collected on LastName provide an estimate of the number of individuals having the last name Smith.
These statistics may not provide good estimates of the number of individuals named John Smith. For that, you need to collect statistics on the two columns jointly. This example is provided only to make a clear distinction between collecting statistics on individual columns and collecting statistics on those same columns jointly.
The following table provides a general guideline for determining how to collect statistics on multiple columns.
Set of Unindexed Columns | Columns on Which to Collect Statistics |
---|---|
Frequently specified together as an equality condition or WHERE clause join condition | Column set jointly |
Infrequently specified together as an equality condition or WHERE clause join condition | Individual columns separately |
Vantage treats multicolumn statistics the same as index statistics if an index has been defined on the same column set.
For example, if you have defined an index on (x1, y1) in table t1, then the following two requests collect the identical statistics and store those statistics in DBC.StatsTbl.
COLLECT STATISTICS ON t1 COLUMN (x1,y1); COLLECT STATISTICS ON t1 INDEX (x1,y1);
When collecting multicolumn statistics, exclude columns that have null values and appear infrequently in predicates.
For example, suppose you are considering collecting multicolumn statistics on columns 1, 2, and 3, and their values are like the following, where a QUESTION MARK character represents a null.
Column 1 | Column 2 | Column 3 |
---|---|---|
1 | 2 | 3 |
? | 2 | 3 |
1 | ? | 3 |
1 | 2 | ? |
? | ? | ? |
1 | 2 | ? |
1 | 2 | ? |
1 | 2 | ? |
1 | 2 | ? |
1 | 2 | ? |
Teradata recommends not collecting multicolumn statistics that include column 3 unless column 3 appears frequently in predicates.