Using HASHROW - Teradata Database

Teradata Database Design

Product
Teradata Database
Release Number
15.10
Language
English (United States)
Last Update
2018-10-06
Product Category
Software

Using HASHROW

HASHROW is especially useful for evaluating the following things.

  • Row distribution efficiency of proposed primary index or primary AMP index columns.
  • Relative efficiency of various data types for proposed or existing primary index or primary AMP index columns.
  • As shown in the following example set, you can use the output from HASHROW to compute the average number of rows that cause hash collisions.

    Example  

    This example uses the HASHROW function on a unique primary index.

    Assume that the emp_no column is the UPI for the employee table. To find the number of hash collisions generated by the values in emp_no, submit the following request.

         SELECT(COUNT(*)(FLOAT))/(COUNT(DISTINCT HASHROW(emp_no)))
         FROM employee;

    If there are no hash collisions, the result ratio is close to 1. The larger the returned value, the more hash collisions occur and the less efficient the analyzed column expression analyzed is as a primary index.

    If a proposed primary index contains non‑unique columns, perform the following check.

    1 Submit the request in the example against the proposed primary index column expression and save the result.

    2 Resubmit the request against a column expression where each index value is unique and save the result.

    3 Compare the result step 1 against the result of step 2.

    The comparison should indicate clearly which index generates fewer hash collisions.

    Example  

    The following example uses the HASHROW function against a the same primary index column analyzed in “Example 1” on page 192, but retyped as DECIMAL(2,2).

    1 Determine how many hash collisions occur with the existing data type by submitting the same query used in “Example 1” on page 192.

         SELECT (COUNT(*)(FLOAT))/COUNT(DISTINCT HASHROW  (emp_no)) 
         FROM employee;

    2 Determine whether hash collisions would be reduced by changing the data type to DECIMAL(4,3) by submitting the following two requests (both are required because of the restrictions on the DISTINCT option).

         SELECT COUNT(DISTINCT emp_no(DECIMAL(4,3))) 
         FROM employee; 
     
         SELECT COUNT(DISTINCT HASHROW(emp_no(DECIMAL(4,3))) 
         FROM employee; 

    3 Divide the numbers returned by the two requests to calculate the average number of hash collisions that occur with the new data type.

    4 Compare this average with the result of the first request to determine if the change reduces the collision count.

    Example  

    When used together, HASHAMP, HASHBUCKET, and HASHROW indicate the distribution of primary rows among the AMPs.

    For example, the following query shows the row distribution of the employee table based on the primary index emp_no, from which you can quickly see whether distribution is even or uneven.

         SELECT HASHAMP(HASHBUCKET(HASHROW(emp_no))),COUNT (*) 
         FROM employee 
         GROUP BY 1 
         ORDER BY 1;
     

    IF the distribution is …

    THEN …

    uneven

    try one of the following.

  • Change the primary index to a column set that uniquely identifies each row.
  • Add a column to the primary index to force uniqueness.
  • Change the data type of one or more index columns (see “Hashing and Data Types” on page 198).
  • Change the columns of the primary index or primary AMP index to columns that provide more even distribution.
  • even

    create your production table defining the primary index or primary AMP index based on the most high-performing column set. For example, either of the following examples where preliminary performance results are based on trials on a test system.

  • A primary index or primary AMP index based on a column that provides the best performance.
  • A primary index or primary AMP index based on the same column, but having a particular data type that provides the best performance (see “Hashing and Data Types” on page 198).