Optimizing Query Design
Optimized INSERT . . . SELECT Requests
Empty Table INSERT . . . SELECT Requests and Performance
An INSERT . . . SELECT optimizes performance when the target table is empty. If the target table has no data, INSERT . . . SELECT operates on an efficient block-by-block basis that bypasses journaling.
Normally, when inserting a row into a table, the system must make a corresponding entry into the TJ to roll back the inserted row in case the transaction aborts.
The advantages of using optimized INSERT . . . SELECTs are:
Example
Using multiple Regional Sales History tables, build a single summary table by combining summaries from the different regions. Then insert these summaries into a single table via a multistatement INSERT . . . SELECT statement.
All multistatement INSERT . . . SELECT statements output to the same spool table. The output is sorted and inserted into an empty table.
Form a multistatement request by semicolon placement in BTEQ as shown below, or by placing statements in a single macro.
Note: If you execute each of the statements as separate requests, only the first statement is inserted into an empty table.
INSERT into Summary_Table
SELECT store, region,sum(sales),count(sale_item)
FROM Region_1
GROUP BY 1,2
;INSERT into Summary_Table
SELECT region2, sum (sales), count(sale_item)
FROM Region_2
GROUP BY 1,2
. . .
;INSERT into Summary_Table
SELECT region3, sum(sales), count(sale_item)
FROM Region_N
GROUP BY 1,2;
INSERT . . . SELECT Into an Empty SET Table
INSERT . . . SELECT into an empty SET table from a source known not to have duplicate rows avoids duplicate checking of the target table during insertion. This occurs even during direct insertion from another SET table.
This should offer significant performance improvement in cases where there is a NUPI that is relatively nonunique or has few values that are very nonunique.
INSERT . . . SELECT with FastLoad
Use the optimized INSERT . . . SELECT to manipulate FastLoad data:
1 FastLoad into a staging table.
2 INSERT . . . SELECT into the final table, manipulating the data as required.
FastLoad and INSERT . . . SELECT are faster than using an INMOD to manage data on the host. The host is a single bottleneck as opposed to parallel AMPs that populate temporary tables for reports or intermediate results.
Multiple source tables may populate the same target table. If the target table is empty before a request begins, all INSERT . . . SELECT statements in that request run in the optimized mode.
The staging table can be a No Primary Index (NoPI) table, which has rows that are not hash distributed, as the staging table. FastLoad runs faster inserting into a NoPI table because there is no sort and no row redistribution involved; rows are simply appended to the table evenly across all AMPs.
Use of NoPI reduces skew in intermediate tables that have no natural primary index, and is particularly useful for staging tables. However, because the rows of a NoPI table are not hashed-based, an INSERT/SELECT from a NoPI table to a PI table may be slower than INSERT/SELECT where the tables share the same PI definition. Consider NoPI tables for use with BI tools and applications that generate many intermediate tables
An INSERT . . . SELECT from a NoPI table to a PI table can be slower than an INSERT . . . SELECT from a PI table to a PI table with the same PI.
INSERT . . . SELECT with Join Index
The fastest way of processing inserts into a table with a join index is:
1 Use FastLoad to load the rows into a staging table with no indexes or join indexes defined.
2 Do an INSERT . . . SELECT from the staging table into the target table with the join index.
If the target table has multiple join indexes defined, the Optimizer may choose to use reusable spool during join index maintenance, if applicable.
Bulk SQL Error Logging
Teradata Database supports bulk SQL error handling for MERGE and INSERT . . . SELECT statements. This permits bulk SQL inserts and updates to be done without the target table restrictions that apply to Teradata Database load utilities.
Load utilities are restricted from unique indexes, join or hash indexes, referential constraints, triggers, and LOBs on the target table.
USI and referential integrity (RI) violations cause the request to abort and rollback after all these violations and all other supported error conditions are logged.
Related Information
For information on creating and dropping the error table, see SQL Data Definition Language.
Using the TOP N Row Option
As an option to the SELECT statement, the TOP N option automatically restricts the output of queries to a certain number of rows. This option provides a fast way to get a small sample of the data from a table without having to scan the entire table. For example, a user may want to examine the data in an Orders table by browsing through only 10 rows from that table.
The value of N can be passed into the operator by means of a macro, stored procedure, or USING request modifier parameter.
Performance Optimizations
TOP N option is optimized for handling TOP N and “any N” requests. Optimizations include
Performance Considerations
For best performance, use the TOP N option instead of the QUALIFY clause with RANK or ROW_NUMBER.
See “SELECT” on page 12.
If a SELECT statement using the TOP N option does not also specify an ORDER BY clause, the performance of the SELECT statement is better with BTEQ than with FastExport.
Using Recursive Queries
A recursive query is a way to query hierarchies of data, such as an organizational structure, bill-of-materials, and document hierarchy.
Recursion is typically characterized by three steps:
Similarly, a recursive query has three execution phases:
Ways to Specify a Recursive Query
You can specify a recursive query by:
For a complete description of the recursive query feature, with examples that illustrate how it is used and its restrictions, see SQL Fundamentals.
For information on WITH RECURSIVE clause, see “WITH Statement Modifier” on page 60.
For information on the RECURSIVE clause in a CREATE VIEW statement, that is, for information on recursive views, see SQL Data Definition Language.
Performance Considerations
The following broadly characterizes the performance impact of recursive queries with respect to execution time: