Optimizing Query Design

Optimized INSERT . . . SELECT Requests

Empty Table INSERT . . . SELECT Requests and Performance

An INSERT . . . SELECT optimizes performance when the target table is empty. If the target table has no data, INSERT . . . SELECT operates on an efficient block-by-block basis that bypasses journaling.

Normally, when inserting a row into a table, the system must make a corresponding entry into the TJ to roll back the inserted row in case the transaction aborts.

For inserts into populated tables, if the transaction aborts, the system deletes all inserts from the table one row at a time by scanning the TJ for the transaction-related RowIDs.

For inserts into empty tables, if the transaction aborts, the system can easily return the table to its original state by deleting all rows.

The advantages of using optimized INSERT . . . SELECTs are:

Block-at-a-time processing

Faster insert logic (that eliminates block merge complexity)

Instantaneous rollback for aborted INSERT . . . SELECTs

Example

Using multiple Regional Sales History tables, build a single summary table by combining summaries from the different regions. Then insert these summaries into a single table via a multistatement INSERT . . . SELECT statement.

All multistatement INSERT . . . SELECT statements output to the same spool table. The output is sorted and inserted into an empty table.

Form a multistatement request by semicolon placement in BTEQ as shown below, or by placing statements in a single macro.

Note: If you execute each of the statements as separate requests, only the first statement is inserted into an empty table.

  INSERT into Summary_Table

  SELECT store, region,sum(sales),count(sale_item)

  FROM Region_1

  GROUP BY  1,2

  ;INSERT into Summary_Table

  SELECT region2, sum (sales), count(sale_item)

  FROM Region_2

  GROUP BY     1,2

      . . .

  ;INSERT into   Summary_Table

  SELECT region3, sum(sales), count(sale_item)

  FROM Region_N

  GROUP BY     1,2;

INSERT . . . SELECT Into an Empty SET Table

INSERT . . . SELECT into an empty SET table from a source known not to have duplicate rows avoids duplicate checking of the target table during insertion. This occurs even during direct insertion from another SET table.

This should offer significant performance improvement in cases where there is a NUPI that is relatively nonunique or has few values that are very nonunique.

INSERT . . . SELECT with FastLoad

Use the optimized INSERT . . . SELECT to manipulate FastLoad data:

1 FastLoad into a staging table.

2 INSERT . . . SELECT into the final table, manipulating the data as required.

FastLoad and INSERT . . . SELECT are faster than using an INMOD to manage data on the host. The host is a single bottleneck as opposed to parallel AMPs that populate temporary tables for reports or intermediate results.

Multiple source tables may populate the same target table. If the target table is empty before a request begins, all INSERT . . . SELECT statements in that request run in the optimized mode.

The staging table can be a No Primary Index (NoPI) table, which has rows that are not hash distributed, as the staging table. FastLoad runs faster inserting into a NoPI table because there is no sort and no row redistribution involved; rows are simply appended to the table evenly across all AMPs.

Use of NoPI reduces skew in intermediate tables that have no natural primary index, and is particularly useful for staging tables. However, because the rows of a NoPI table are not hashed-based, an INSERT/SELECT from a NoPI table to a PI table may be slower than INSERT/SELECT where the tables share the same PI definition. Consider NoPI tables for use with BI tools and applications that generate many intermediate tables

An INSERT . . . SELECT from a NoPI table to a PI table can be slower than an INSERT . . . SELECT from a PI table to a PI table with the same PI.

INSERT . . . SELECT with Join Index

The fastest way of processing inserts into a table with a join index is:

1 Use FastLoad to load the rows into a staging table with no indexes or join indexes defined.

2 Do an INSERT . . . SELECT from the staging table into the target table with the join index.

If the target table has multiple join indexes defined, the Optimizer may choose to use reusable spool during join index maintenance, if applicable.

Bulk SQL Error Logging

Teradata Database supports bulk SQL error handling for MERGE and INSERT . . . SELECT statements. This permits bulk SQL inserts and updates to be done without the target table restrictions that apply to Teradata Database load utilities.

Load utilities are restricted from unique indexes, join or hash indexes, referential constraints, triggers, and LOBs on the target table.

USI and referential integrity (RI) violations cause the request to abort and rollback after all these violations and all other supported error conditions are logged.

Related Information

For information on creating and dropping the error table, see SQL Data Definition Language.

Using the TOP N Row Option

As an option to the SELECT statement, the TOP N option automatically restricts the output of queries to a certain number of rows. This option provides a fast way to get a small sample of the data from a table without having to scan the entire table. For example, a user may want to examine the data in an Orders table by browsing through only 10 rows from that table.

The value of N can be passed into the operator by means of a macro, stored procedure, or USING request modifier parameter.

Performance Optimizations

TOP N option is optimized for handling TOP N and “any N” requests. Optimizations include

Adding an AMP runtime optimization for TOP n PERCENT operations.

Extending “any N” optimization to INSERT … SELECT and CREATE TABLE … AS requests, views, and derived tables for all values of N.

Adding an optimization that avoids redistributing the rows for the hash partitioning case when the grouping columns of a window function contain the PI columns of the source relation.

Adding a RankLimit optimization for a TOP n operation that does not specify the WITH TIES option.

Adding runtime optimizations for TOP n in a request that specifies an ORDER BY specification.

Performance Considerations

For best performance, use the TOP N option instead of the QUALIFY clause with RANK or ROW_NUMBER.

In best cases, the TOP N option provides better performance.

In worse cases, the TOP N option provides equivalent performance.

See “SELECT” on page 12.

If a SELECT statement using the TOP N option does not also specify an ORDER BY clause, the performance of the SELECT statement is better with BTEQ than with FastExport.

Using Recursive Queries

A recursive query is a way to query hierarchies of data, such as an organizational structure, bill-of-materials, and document hierarchy.

Recursion is typically characterized by three steps:

Initialization

Recursion, or repeated iteration of the logic through the hierarchy

Termination

Similarly, a recursive query has three execution phases:

Initial result set

Iteration based on the existing result set

Final query to return the final result set

Ways to Specify a Recursive Query

You can specify a recursive query by:

Preceding a query with the WITH RECURSIVE clause.

Creating a view using the RECURSIVE clause in a CREATE VIEW statement.

For a complete description of the recursive query feature, with examples that illustrate how it is used and its restrictions, see SQL Fundamentals.

For information on WITH RECURSIVE clause, see “WITH Statement Modifier” on page 60.

For information on the RECURSIVE clause in a CREATE VIEW statement, that is, for information on recursive views, see SQL Data Definition Language.

Performance Considerations

The following broadly characterizes the performance impact of recursive queries with respect to execution time:

Using a recursive query shows a significant performance improvement over using temporary tables with a stored procedures. In most cases, there is a highly significant improvement.

Using the WITH RECURSIVE clause has basically the same or equivalent performance as using the RECURSIVE VIEW.

In using a recursive query, it is important to put depth limits on the recursion to prevent infinite recursion when there are cycles in the underlying data.