Index Considerations | Database Design | VantageCloud Lake - Index Considerations - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905
Evaluate the following considerations when determining which columns to use for an index, and which data types to assign to those columns:
  • For the Optimizer to choose the most efficient steps for processing a query, you must collect statistics regularly on all indexed columns and on all columns frequently specified in query predicates. This makes sure that the Optimizer has the most current information on the database demographics. See COLLECT STATISTICS (Optimizer Form).
  • When multiple rows have the same hash value (hash collisions) for a PI or PA, the AMPs may receive an uneven number of rows. Use the Teradata hash-related functions to check whether PI and PA candidate columns would cause a skewed row distribution. For more information on hash functions, see Hash Functions to Evaluate PI and PA Candidates.
  • Suitability of the data type for indexing:
    • Numeric data types are compact and often tend toward multiple distinct values and fewer rows per distinct value, leading to fewer hash collisions.

      The data type of a column can affect the way column values are hashed. The Teradata hashing algorithm operates on the internal bit pattern representation of the data, rather than on the externally visible data values.

      For example, a numeric value stored as a DECIMAL type with precision of one produces a different hash value than the same numeric values stored as DECIMAL with a different precision. An INTEGER type requires only four bytes of storage, but the same number stored as DECIMAL can take up to eight bytes, depending on the size of the number.

    • Character data and byte types tend not to be compact or, if the length is small, tend to have few distinct values (leading to hash collisions).
    • Columns with complex data types (BLOB, CLOB, DATASET, XML, JSON, Period, ARRAY, and VARRAY) cannot be index columns. A column with a geospatial data type can be an index column for a NUSI, if no other columns are included.
    • Join and predicate conversions: If joined columns do not have identical data types and precisions, the values from one column must be converted to match the type of the other. This is also true for comparisons made in the WHERE clause. The conversion process causes poor query performance, reducing the benefits of indexing.