- In order for the Optimizer to choose the most efficient steps for processing a query, it is important that you collect statistics regularly on all indexed columns and on all columns frequently specified in query predicates. This ensures that the Optimizer has the most current information on the database demographics. For more information on the COLLECT STATISTICS (Optimizer Form) statement, see SQL Data Definition Language - Syntax and Examples.
- When there are many rows that have the same hash value (hash collisions) for a PI or PA, the AMPs might receive an uneven number of rows. Use the Teradata Database hash-related functions to check whether PI and PA candidate columns would result in such a skewed row distribution. For more information on hash functions, see Hash Functions to Evaluate PI and PA Candidates.
- Suitability of the data type for indexing:
- Numeric data types are often best because they are compact and often
tend toward very many distinct values and fewer rows per distinct value,
thus leading to fewer hash collisions.
The data type of a column can affect the way column values are hashed. The Teradata Database hashing algorithm operates on the internal bit pattern representation of the data, rather than on the externally visible data values.
For example, a numeric value stored as a DECIMAL type with precision of one produces a different hash value than the same numeric values stored as DECIMAL with a different precision. An INTEGER type requires only four bytes of storage, but the same number stored as DECIMAL can take up to eight bytes, depending on the size of the number.
- Character data and byte types may be less suitable choices for index columns because they tend not to be compact or, if the length is small, tend to have few distinct values (leading to hash collisions).
- Columns with complex data types (BLOB, CLOB, DATASET, XML, JSON, Period, ARRAY, and VARRAY) cannot be index columns. A column with a geospatial data type can be an index column for a NUSI, if no other columns are included.
- Join and predicate conversions: If joined columns do not have identical data types and precisions, the values from one column must be converted to match the type of the other. This is also true for comparisons made in the WHERE clause. In these cases, the conversion process results in poor query performance reducing the benefits of indexing.
- Numeric data types are often best because they are compact and often tend toward very many distinct values and fewer rows per distinct value, thus leading to fewer hash collisions.
You should evaluate the following considerations when determining which columns to use for an index, and which data types to assign to those columns: