15.00 - Row Allocation for INSERT … SELECT Operations Into NoPI Tables and Column-Partitioned Tables - Teradata Database

Teradata Database Design

prodname
Teradata Database
vrm_release
15.00
category
User Guide
featnum
B035-1094-015K

Row Allocation for INSERT … SELECT Operations Into NoPI Tables and Column‑Partitioned Tables

Because NoPI tables and column‑partitioned tables do not have a primary index, Teradata Database randomly distributes their individual rows or blocks of rows to the AMPs or copies them locally rather than hash‑distributing them in the way that rows are allocated for primary‑indexed tables.

A common way to load multiple rows into NoPI tables and column‑partitioned tables is to use INSERT … SELECT requests. Besides just loading rows into such tables, you can also specify the HASH BY and LOCAL ORDER BY clauses to distribute and sort the rows from the SELECT subquery of an INSERT … SELECT request before inserting them into a NoPI or column‑partitioned table. You specify these options either individually or together within a request, but you can only specify them if the target table or underlying table of the target view does not have a primary index (see SQL Data Manipulation Language for details about the syntax and usage of the HASH BY and LOCAL ORDER BY options with INSERT … SELECT requests).

The HASH BY option redistributes the selected rows by the hash value you specify. You can follow a HASH BY specification with a LOCAL ORDER BY option that orders the rows locally and then inserts them locally into the target table or underlying table of the target view. This is useful if the result of the SELECT subquery does not provide an even distribution.

If the target table or underlying table of the target view is also column‑partitioned, you can use these options to distribute equal values of a column to the same AMP, which might enable effective autocompression of the column partitions with the columns on which the hash value is calculated.

Specifying HASH BY RANDOM redistributes the data blocks of the selected rows randomly, and is a handy way to redistribute rows when there is no particular column set on which to hash distribute them. Distributing data blocks is more efficient than distributing individual rows and generally provides a very even distribution; however, distributing individual rows sometimes provides a more even distribution.

HASH BY RANDOM(1,2000000000), for example, is a useful function to specify for cases where you want to redistribute individual rows rather than data blocks of rows.

Assume the following definitions for the examples that follow.

     CREATE TABLE orders (
       o_orderkey    INTEGER NOT NULL,
       o_custkey     INTEGER,
       o_orderstatus CHARACTER(1) CASESPECIFIC,
       o_totalprice  DECIMAL(13,2) NOT NULL,
       o_ordertsz    TIMESTAMP(6) WITH TIME ZONE NOT NULL,
       o_comment     VARCHAR(79))
     UNIQUE INDEX (o_orderkey),
     PARTITION BY COLUMN;
 
     CREATE TABLE orders_staging AS orders
     WITH NO DATA
     NO PRIMARY INDEX;
 
     CREATE TABLE tpi (
       a INTEGER,
       b INTEGER,
       c INTEGER)
     PRIMARY INDEX (a);
 
     CREATE TABLE tnopi1 (
       a INTEGER,
       b INTEGER,
       c INTEGER)
     NO PRIMARY INDEX;
 
     CREATE TABLE tnopi2 (
       a INTEGER,
       b INTEGER,
       c INTEGER)
     NO PRIMARY INDEX;

The following INSERT … SELECT request selects rows from the NoPI table orders_staging and redistributes them randomly as data blocks before copying them locally into the column‑partitioned table orders. This request does not specify a LOCAL ORDER BY clause to order the selected rows locally before inserting them into the target table orders.

     INSERT INTO orders
       SELECT *
       FROM orders_staging
       HASH BY RANDOM;

In contrast, the following INSERT … SELECT request selects rows from the NoPI table orders_staging and redistributes them randomly as individual rows before copying them locally into the column‑partitioned table orders. This request does not specify a LOCAL ORDER BY clause to order the selected rows locally before inserting them into the target table orders.

     INSERT INTO orders
       SELECT *
       FROM orders_staging
       HASH BY RANDOM(1,2000000000);

The following INSERT … SELECT request uses the HASH BY specification with o_totalprice from the column‑partitioned table orders. This corresponds to the fourth expression in the select list of the SELECT subquery, which is o_totalprice. Teradata Database distributes the spool generated for the SELECT on the value of o_totalprice rather than the value of o_totalprice from orders_staging.

     INSERT INTO orders
       SELECT o_orderkey, o_custkey, o_orderstatus, o_totalprice,
              o_ordertsz, o_comment
       FROM orders_staging
       HASH BY o_totalprice;

Specifying a LOCAL ORDER BY clause orders the selected rows locally on their internal combined partition number (which is always 1 for column‑partitioned tables) using the partitioning of the target if it is row‑partitioned and then on any specified ordering expressions before inserting them locally into the target. If the target table or view is also column‑partitioned, a LOCAL ORDER BY specification can provide more effective autocompression of its column partitions with the columns on which the ordering is done.

For example, the following INSERT … SELECT request redistributes the rows selected from orders_staging by the hash value of o_orderkey to more evenly distribute them. Teradata Database then orders the selected rows locally to enable better run length compression on the o_ordertsz column partition before copying the data locally into the column‑partitioned target table orders.

     INSERT INTO orders
       SELECT *
       FROM orders_staging
       HASH BY o_orderkey
       LOCAL ORDER BY o_ordertsz;

The HASH BY and LOCAL ORDER BY clauses are also useful for INSERT … SELECT operations on nonpartitioned NoPI tables. The following example selects rows from the primary‑indexed table tpi and redistributes them randomly as data blocks before locally coping them into the NoPI table tnopi_1.

     INSERT INTO tnopi_1 
       SELECT * 
       FROM tpi 
       HASH BY RANDOM;

In contrast, the following example selects rows from the primary‑indexed table tpi and redistributes them randomly as individual rows before locally copying them into the NoPI table tnopi_1.

     INSERT INTO tnopi_1 
       SELECT * 
       FROM tpi 
       HASH BY RANDOM(1, 2000000000);

The next example selects rows from the nonpartitioned NoPI table tnopi_1 and redistributes them randomly as data blocks before locally copying them into the NoPI table tnopi_2.

     INSERT INTO tnopi_2 
       SELECT * 
       FROM tnopi_1 
       HASH BY RANDOM;

In contrast, the following example selects rows from the nonpartitioned NoPI table tnopi_1 and redistributes them randomly as individual rows before locally copying them into the NoPI table tnopi_2.

     INSERT INTO tnopi_2 
       SELECT * 
       FROM tnopi_1 
       HASH BY RANDOM(1, 2000000000);