Rules for Using HASH BY or LOCAL ORDER BY to Insert Rows

Rules for Using HASH BY or LOCAL ORDER BY to Insert Rows - Teradata Database

SQL Data Manipulation Language

Product

Teradata Database

Release Number

15.10

Language

English (United States)

Last Update

2018-10-06

dita:id

B035-1146

lifecycle

Product Category

Teradata® Database

Rules for Using HASH BY or LOCAL ORDER BY to Insert Rows

The following rules apply:

You can only specify a HASH BY clause if the target table or the underlying target view table is a NoPI table. If the table has a primary index or Primary AMP index, the system returns an error or failure message, depending on the session mode.

If you do not specify a HASH BY clause, the system does not redistribute the source rows before inserting into the target NoPI table.

If a NoPI target table has row partitioning that is different than the source table, the system sorts the source rows locally on the internal partition number, then copies the rows locally into the target table. The internal partition number is computed based on the row‑partitioning of the target table.

If you specify a HASH BY clause with a hash list, the system first redistributes the selected rows by a hash value based on the hash list.

If you also specify a LOCAL ORDER BY clause or if the target table is row-partitioned, the system orders the rows locally and inserts them locally into the target table or underlying target view table.

This is useful if the result of the subquery does not provide an even distribution of the rows. If the target table or underlying target view table is also column‑partitioned, the locally ordered hash redistribution might also be useful to distribute equal values of a column to the same AMP, which might then enable effective autocompression of the column partitions with the columns on which the hash value is calculated.

Because the object you specify for a HASH BY clause is an expression, the expression can be a function call such as RANDOM(n, m).

HASH BY RANDOM(1, 2000000000) is useful to redistribute each individual selected row when there is no particular column set on which to hash distribute the rows, and when a more even distribution is needed than the HASH BY RANDOM clause provides.

A poor choice of a hash expression can lead to a very uneven distribution, similar to when a poor choice is made for primary index columns.

If you specify a HASH BY RANDOM clause, the system first redistributes data blocks of selected rows randomly.

If you also specify a LOCAL ORDER BY clause, the system orders the rows locally and inserts them locally into the target table or underlying target view table.

This is useful if the result of the subquery does not provide an even distribution of rows. Distributing data blocks is more efficient than distributing individual rows and usually provides a nearly even distribution. However, distributing individual rows using an expression like HASH BY RANDOM(1,2000000000) can provide a more even distribution of rows, which might be necessary in some cases.

You can specify a LOCAL ORDER BY clause if the target table or the underlying target view table is a PI, PA, or NoPI table. If you specify a LOCAL ORDER BY clause and the target table is row partitioned, the system orders the selected rows locally according to their internal partition numbers computed based on the row-partitioning of the target table, with the column partition number (if any), as 1, (if the target table or underlying target view table is column partitioned) using the partitioning of the target table or underlying target view table if it has row partitioning, and then the ordering expressions after they have been redistributed if you have also specified a HASH BY clause. The final step is to insert the rows locally into the target table or underlying target view table.

If the target table or underlying target view table is also column‑partitioned, this might allow for more effective autocompression of the column partitions with the columns on which the ordering is done.

Teradata Database first resolves column references specified in a HASH BY hash list to the select expression in the subquery expression list corresponding to a matching column name of the target table or view and, if it does not find a matching column, resolved to a result or underlying column in the subquery per the existing rules of resolving column references in an ORDER BY clause for a SELECT request.