15.00 - Number of Rows Per Distinct Value - Teradata Database

Teradata Database Design

prodname
Teradata Database
vrm_release
15.00
category
User Guide
featnum
B035-1094-015K

Number of Rows Per Distinct Value

This value reports the largest number of rows per primary index column value.

 

FOR this type of primary index …

The value for this measure is …

unique

always 1.

non‑unique

better as it approaches 1 and usually, but not necessarily, worse as it gets larger.

If the maximum number of rows per column value is much larger than 1, then the column is often not a good candidate for a primary index. Because the evenness of the distribution of rows per value is itself an important factor, with better distribution correlating with more even distributions, a large number of rows per distinct value is not necessarily an indicator that the column set is a poor choice for a primary index. The severity of the penalty paid for larger values is a function of several variables, including the cardinality of the table, the number of AMPs in the configuration, and so on.

In the past, it was commonly stated that if the typical quantity of rows per column value does not fit into a single data block, then the column set is not a good candidate for a primary index. With the significant increase in data block size now used by most sites, the evaluation of this measure is less certain to provide strong guidance in picking a primary index column set.

You can graph these figures to provide an easy-to-comprehend graph of column value distributions. In the example provided here, a NUPI on an attribute called State was analyzed. Note the exceedingly skewed distribution, suggesting that State is not a good candidate for the primary index on this table.

The following table indicates the raw and logarithmic row cardinalities per state code value.

 

               State Code

      Number of Rows

                 log10(Number of Rows)

Null

               30,000

                              4.477

AZ

                      70

                              1.845

CA

               15,000

                              4.176

GA

                      30

                              1.477

HI

                      10

                              1.000

IL

                      30

                              1.477

MI

                      30

                              1.477

MO

                      30

                              1.477

NV

                      30

                              1.477

NY

                    100

                              2.000

The following graph displays the row cardinalities on a logarithmic scale in the following graph. Notice how skewed the distribution is even when displayed on a logarithmic scale: