Empty Table INSERT ... SELECT Requests and Performance
An INSERT ... SELECT optimizes performance when the target table is empty. If the target table has no data, INSERT ... SELECT operates on an efficient block-by-block basis that bypasses journaling.
- For inserts into populated tables, if the transaction aborts, the system deletes all inserts from the table one row at a time by scanning the TJ for the transaction-related RowIDs.
- For inserts into empty tables, if the transaction aborts, the system can easily return the table to its original state by deleting all rows.
- Block-at-a-time processing
- Faster insert logic, to eliminate block merge complexity
- Instantaneous rollback for aborted INSERT ... SELECTs
Example: INSERT ... SELECT
Using multiple Regional Sales History tables, build a single summary table by combining summaries from the different regions. Then insert these summaries into a single table via a multistatement INSERT ... SELECT statement.
All multistatement INSERT ... SELECT statements output to the same spool table. The output is sorted and inserted into an empty table.
Form a multistatement request by semicolon placement in BTEQ as follows, or by placing statements in a single macro.
INSERT into Summary_Table SELECT store, region,sum(sales),count(sale_item) FROM Region_1 GROUP BY 1,2 ;INSERT into Summary_Table SELECT region2, sum (sales), count(sale_item) FROM Region_2 GROUP BY 1,2 . . . ;INSERT into Summary_Table SELECT region3, sum(sales), count(sale_item) FROM Region_N GROUP BY 1,2;
INSERT ... SELECT into Empty SET Table
INSERT ... SELECT into an empty SET table from a source known not to have duplicate rows avoids duplicate checking of the target table during insertion. This occurs even during direct insertion from another SET table.
This should offer significant performance improvement in cases where there is a NUPI that is relatively nonunique or has few values that are very nonunique.
INSERT ... SELECT with FastLoad
- FastLoad into a staging table.
- INSERT ... SELECT into the final table, manipulating the data as required.
FastLoad and INSERT ... SELECT are faster than using an INMOD to manage data on the host. The host is a single bottleneck as opposed to parallel AMPs that populate temporary tables for reports or intermediate results.
Multiple source tables may populate the same target table. If the target table is empty before a request begins, all INSERT ... SELECT statements in that request run in the optimized mode.
The staging table can be a No Primary Index (NoPI) table, which has rows that are not hash distributed, as the staging table. FastLoad runs faster inserting into a NoPI table because there is no sort and no row redistribution involved; rows are simply appended to the table evenly across all AMPs.
Use of NoPI reduces skew in intermediate tables without a primary index and is useful for staging tables. However, because the rows of a NoPI table are not hashed-based, an INSERT ... SELECT from a NoPI table to a PI table may be slower than INSERT ... SELECT where the tables share the same PI definition. Consider NoPI tables for use with BI tools and applications that generate many intermediate tables
An INSERT ... SELECT from a NoPI table to a PI table can be slower than an INSERT ... SELECT from a PI table to a PI table with the same PI.
INSERT ... SELECT with Join Index
- Use FastLoad to load the rows into a staging table with no indexes or join indexes defined.
- Do an INSERT ... SELECT from the staging table into the target table with the join index.
If the target table has multiple join indexes defined, the Optimizer may choose to use reusable spool during join index maintenance, if applicable.