This topic describes a general method by which Teradata Database maintains a join index during simple delete operations.
The following is an example of a general case method for maintaining a join index during a simple DELETE request.
Note the following items in the EXPLAIN output:
This step … |
Does this … |
5 |
Reproduces the affected portion of the join index rows. |
6.1 |
Deletes the corresponding rows from the join index. |
8 |
Inserts new nonmatching outer join rows into the join index. |
EXPLAIN
DELETE FROM ordertbl
WHERE o_custkey = 1001;
*** Help information returned. 37 rows.
*** Total elapsed time was 2 seconds.
Explanation
-----------------------------------------------------------------------
1) First, we lock a distinct df2."pseudo table" for write on a
RowHash to prevent global deadlock for df2.OrderJoinLine.
2) Next, we lock a distinct df2."pseudo table" for write on a
RowHash to prevent global deadlock for df2.Ordertbl.
3) We lock a distinct df2."pseudo table" for read on a RowHash to
prevent global deadlock for df2.lineitem.
4) We lock df2.OrderJoinLine for write, we lock df2.Ordertbl
for write, and we lock df2.lineitem for read.
5) We do an all-AMPs JOIN step from df2.Ordertbl by way of a
RowHash match scan with a condition of (
"df2.Ordertbl.o_custkey = 1001"), which is joined to
df2.lineitem. df2.Ordertbl and df2.lineitem are joined
using a merge join, with a join condition of (
"df2.lineitem.l_orderkey = df2.Ordertbl.o_orderkey"). The
input tables df2.Ordertbl and df2.lineitem will not be
cached in memory, but they are eligible for synchronized scanning.
The result goes into Spool 1, which is built locally on the AMPs.
Then we do a SORT to order Spool 1 by row hash. The result spool
file will not be cached in memory. The size of Spool 1 is
estimated to be 100,000 rows. The estimated time for this step is
6 minutes and 5 seconds.
6) We execute the following steps in parallel.
1) We do a MERGE DELETE to df2.OrderJoinLine from Spool 1.
2) We do an all-AMPs DELETE from df2.Ordertbl by way of an
all-rows scan with a condition of
("df2.Ordertbl.o_custkey = 1001").
7) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of
an all-rows scan into Spool 2, which is redistributed by hash code
to all AMPs. Then we do a SORT to order Spool 2 by join index.
The result spool file will not be cached in memory. The size of
Spool 2 is estimated to be 100,000 rows. The estimated time for
this step is 35 minutes and 59 seconds.
8) We do a MERGE into df2.OrderJoinLine from Spool 2 (Last Use).
9) We spoil the parser's dictionary cache for the table.
10) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> No rows are returned to the user as the result of statement 1.