Rules for Using HASH BY or LOCAL ORDER BY to Insert Rows - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905
The following rules apply:
  • You can only specify a HASH BY clause if the target table or the underlying target view table is a NoPI table. If the table has a primary index or Primary AMP index, the system returns an error or failure message, depending on the session mode.
  • If you do not specify a HASH BY clause, the system does not redistribute the source rows before inserting into the target NoPI table.

    If a NoPI target table has row partitioning that is different than the source table, the system sorts the source rows locally on the internal partition number, then copies the rows locally into the target table. The internal partition number is computed based on the row-partitioning of the target table.

  • If you specify a HASH BY clause with a hash list, the system first redistributes the selected rows by a hash value based on the hash list.

    If you also specify a LOCAL ORDER BY clause, or if the target table is row-partitioned, the system orders the rows locally and inserts them locally into the target table or underlying target view table.

    This is useful if the result of the subquery does not provide an even distribution of the rows. If the target table or underlying target view table is also column-partitioned, the locally ordered hash redistribution may also be useful to distribute equal values of a column to the same AMP, which may then enable effective autocompression of the column partitions with the columns on which the hash value is calculated.

    Because the object you specify for a HASH BY clause is an expression, the expression can be a function call such as RANDOM(n, m).

    HASH BY RANDOM(1, 2000000000) is useful to redistribute each individual selected row when there is no column set on which to hash distribute the rows, and when a more even distribution is needed than the HASH BY RANDOM clause provides.

    A poor hash expression can cause an uneven distribution.

  • If you specify a HASH BY RANDOM clause, the system first redistributes data blocks of selected rows randomly.

    If you also specify a LOCAL ORDER BY clause, the system orders the rows locally and inserts them locally into the target table or underlying target view table.

    This is useful if the result of the subquery does not provide an even distribution of rows. Distributing data blocks is more efficient than distributing individual rows and usually provides a nearly even distribution. However, distributing individual rows using an expression like HASH BY RANDOM(1,2000000000) can provide a more even distribution of rows.

  • You can specify a LOCAL ORDER BY clause if the target table or underlying target view table is a PI, PA, or NoPI table.

    If you specify a LOCAL ORDER BY clause and the target table is row-partitioned, the system orders the selected rows locally according to their internal partition numbers. The system computes the partition numbers based on the row partitioning of the target table, using the partitioning of the target table or underlying target view table if row-partitioned, and then the ordering expressions after they have been redistributed if you specified a HASH BY clause. If the target table or underlying target view table is column-partitioned, the column partition number is 1. The final step is to insert the rows locally into the target table or underlying target view table.

    If the target table or underlying target view table is also column-partitioned, this may allow for more effective autocompression of the column partitions with the columns on which the ordering is done.

  • The database first resolves column references specified in a HASH BY hash list to the select expression in the subquery expression list corresponding to a matching column name of the target table or view. If there is no matching column, the database resolves the column references to a result column or underlying column in the subquery according to the existing rules of resolving column references in an ORDER BY clause for a SELECT request.