15.00 - Treating the Rows Null Counts for Composite Indexes - Teradata Database

Teradata Database Design

prodname
Teradata Database
vrm_release
15.00
category
User Guide
featnum
B035-1094-015K

Treating the Rows Null Counts for Composite Indexes

An index can be partly or wholly null. This is almost never a good idea, however. See Chapter 13: “Designing for Missing Information” for information about SQL nulls and the many problems they cause with database integrity.

Consider the following example:

 
  • The first row has both employee and department numbers.
  • The second row has a null employee number and a department number.
  • The third row has an employee number and a null department number.
  • The fourth row has nulls for both its employee number and its department number.
  • All four rows are valid primary index rows. Should they all contribute to the count for the Rows Null value or should they be treated differently?

    Note: Only the first of these 4 rows is a valid primary key because by the entity integrity rule (see “Rules for Primary Keys” on page 92), a primary key cannot contain nulls by definition. This is only one of the many reasons that indexes should not be an issue during logical database design.

    The following table provides recommended treatment guidelines for partially and wholly null composite indexes. Composite indexes are also referred to as multicolumn indexes.

     

    IF a composite index value is …

    THEN you should …

    wholly null

    add it to the Rows Null count.

    partially null

    not add it to the Rows Null count.

    The Optimizer can detect the number of distinct values from the rows that are partially null and it tracks them separately from the rows that are wholly null. If a column has many nulls, but is not usually specified in predicates, you probably should not include that column in the multicolumn statistics you collect.

    For example, suppose you are considering collecting multicolumn statistics on columns 1, 2, and 3 of a table, and their values are like the following, where a QUESTION MARK character represents a null.

     

    You would probably not want to collect multicolumn statistics that include column 3 unless it is specified in a large number of predicates.