Optimizing INSERT ... SELECT Requests - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Empty Table INSERT ... SELECT Requests and Performance

An INSERT ... SELECT optimizes performance when the target table is empty. If the target table has no data, INSERT ... SELECT operates on an efficient block-by-block basis that bypasses journaling.

Typically, when inserting a row into a table, the system must make a corresponding entry into the transaction journal (TJ) to roll back the inserted row in case the transaction ends.
  • For inserts into populated tables, if the transaction ends, the system deletes all inserts from the table one row at a time by scanning the TJ for the transaction-related RowIDs.
  • For inserts into empty tables, if the transaction ends, the system can easily return the table to its original state by deleting all rows.
The advantages of using optimized INSERT ... SELECTs are:
  • Block-at-a-time processing
  • Faster insert logic, to eliminate block merge complexity
  • Instantaneous rollback for ended INSERT ... SELECTs

Example: INSERT ... SELECT

Using multiple Regional Sales History tables, build a single summary table by combining summaries from the different regions. Then insert these summaries into a single table using a multiple-statement INSERT ... SELECT statement.

All multiple-statement INSERT ... SELECT statements output to the same spool table. The output is sorted and inserted into an empty table.



Form a multiple-statement request by semicolon placement in BTEQ as follows, or by placing statements in a single macro.

If you run each of the statements as separate requests, only the first statement is inserted into an empty table.
  INSERT into Summary_Table
  SELECT store, region,sum(sales),count(sale_item)
  FROM Region_1
  GROUP BY  1,2
  ;INSERT into Summary_Table
  SELECT region2, sum (sales), count(sale_item)
  FROM Region_2
  GROUP BY     1,2
      . . .
  ;INSERT into   Summary_Table
  SELECT region3, sum(sales), count(sale_item)
  FROM Region_N
  GROUP BY     1,2;

INSERT ... SELECT into Empty SET Table

INSERT ... SELECT into an empty SET table from a source known not to have duplicate rows avoids duplicate checking of the target table during insertion. This occurs even during direct insertion from another SET table.

This offers significant performance improvement where a NUPI is relatively nonunique or has few values that are nonunique.

INSERT ... SELECT with FastLoad

Use the optimized INSERT ... SELECT to manipulate FastLoad data:
  1. FastLoad into a staging table.
  2. INSERT ... SELECT into the final table, manipulating the data as required.

FastLoad and INSERT ... SELECT are faster than using an INMOD to manage data on the host. The host is a single bottleneck as opposed to parallel AMPs that populate temporary tables for reports or intermediate results.

Multiple source tables may populate the same target table. If the target table is empty before a request begins, all INSERT ... SELECT statements in that request run in the optimized mode.

The staging table can be a No Primary Index (NoPI) table, which has rows that are not hash distributed, as the staging table. FastLoad runs faster inserting into a NoPI table because there is no sort and no row redistribution involved; rows are simply appended to the table evenly across all AMPs.

Use of NoPI reduces skew in intermediate tables without a primary index and is useful for staging tables. However, because the rows of a NoPI table are not hashed-based, an INSERT ... SELECT from a NoPI table to a PI table may be slower than INSERT ... SELECT where the tables share the same PI definition. Consider NoPI tables for use with BI tools and applications that generate many intermediate tables

An INSERT ... SELECT from a NoPI table to a PI table can be slower than an INSERT ... SELECT from a PI table to a PI table with the same PI.

INSERT ... SELECT with Join Index

The fastest way of processing inserts into a table with a join index is:
  1. Use FastLoad to load the rows into a staging table with no indexes or join indexes defined.
  2. Do an INSERT ... SELECT from the staging table into the target table with the join index.

If the target table has multiple join indexes defined, the Optimizer may choose to use reusable spool during join index maintenance, if applicable.