Optimizing INSERT ... SELECT Requests - Analytics Database - Teradata Vantage

SQL Data Manipulation Language

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Teradata Vantage
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-10-04
dita:mapPath
pon1628111750298.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
esx1472246586715
lifecycle
latest
Product Category
Teradata Vantageā„¢

Empty Table INSERT ... SELECT Requests and Performance

An INSERT ... SELECT optimizes performance when the target table is empty. If the target table has no data, INSERT ... SELECT operates on an efficient block-by-block basis that bypasses journaling.

Normally, when inserting a row into a table, the system must make a corresponding entry into the transaction journal (TJ) to roll back the inserted row in case the transaction aborts.
  • For inserts into populated tables, if the transaction aborts, the system deletes all inserts from the table one row at a time by scanning the TJ for the transaction-related RowIDs.
  • For inserts into empty tables, if the transaction aborts, the system can easily return the table to its original state by deleting all rows.
The advantages of using optimized INSERT ... SELECTs are:
  • Block-at-a-time processing
  • Faster insert logic, to eliminate block merge complexity
  • Instantaneous rollback for aborted INSERT ... SELECTs

Example: INSERT ... SELECT

Using multiple Regional Sales History tables, build a single summary table by combining summaries from the different regions. Then insert these summaries into a single table via a multistatement INSERT ... SELECT statement.

All multistatement INSERT ... SELECT statements output to the same spool table. The output is sorted and inserted into an empty table.



Form a multistatement request by semicolon placement in BTEQ as follows, or by placing statements in a single macro.

If you execute each of the statements as separate requests, only the first statement is inserted into an empty table.
  INSERT into Summary_Table
  SELECT store, region,sum(sales),count(sale_item)
  FROM Region_1
  GROUP BY  1,2
  ;INSERT into Summary_Table
  SELECT region2, sum (sales), count(sale_item)
  FROM Region_2
  GROUP BY     1,2
      . . .
  ;INSERT into   Summary_Table
  SELECT region3, sum(sales), count(sale_item)
  FROM Region_N
  GROUP BY     1,2;

INSERT ... SELECT into Empty SET Table

INSERT ... SELECT into an empty SET table from a source known not to have duplicate rows avoids duplicate checking of the target table during insertion. This occurs even during direct insertion from another SET table.

This should offer significant performance improvement in cases where there is a NUPI that is relatively nonunique or has few values that are very nonunique.

INSERT ... SELECT with FastLoad

Use the optimized INSERT ... SELECT to manipulate FastLoad data:
  1. FastLoad into a staging table.
  2. INSERT ... SELECT into the final table, manipulating the data as required.

FastLoad and INSERT ... SELECT are faster than using an INMOD to manage data on the host. The host is a single bottleneck as opposed to parallel AMPs that populate temporary tables for reports or intermediate results.

Multiple source tables may populate the same target table. If the target table is empty before a request begins, all INSERT ... SELECT statements in that request run in the optimized mode.

The staging table can be a No Primary Index (NoPI) table, which has rows that are not hash distributed, as the staging table. FastLoad runs faster inserting into a NoPI table because there is no sort and no row redistribution involved; rows are simply appended to the table evenly across all AMPs.

Use of NoPI reduces skew in intermediate tables without a primary index and is useful for staging tables. However, because the rows of a NoPI table are not hashed-based, an INSERT ... SELECT from a NoPI table to a PI table may be slower than INSERT ... SELECT where the tables share the same PI definition. Consider NoPI tables for use with BI tools and applications that generate many intermediate tables

An INSERT ... SELECT from a NoPI table to a PI table can be slower than an INSERT ... SELECT from a PI table to a PI table with the same PI.

INSERT ... SELECT with Join Index

The fastest way of processing inserts into a table with a join index is:
  1. Use FastLoad to load the rows into a staging table with no indexes or join indexes defined.
  2. Do an INSERT ... SELECT from the staging table into the target table with the join index.

If the target table has multiple join indexes defined, the Optimizer may choose to use reusable spool during join index maintenance, if applicable.