An Example of How the Optimizer Takes Advantage of Referential Constraints - Analytics Database - Teradata Vantage

SQL Data Definition Language Detailed Topics

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Teradata Vantage
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-10-04
dita:mapPath
vuk1628111288877.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
jbg1472252759029
lifecycle
latest
Product Category
Teradata Vantage™

To understand how the Optimizer can produce more high-performing query plans when a Referential Constraint is defined, consider the following table definitions. The only difference between the second and third tables is that tb2 defines a Referential Constraint on column b referencing column a in tb1, while tb3 does not define that constraint.

     CREATE TABLE tb1 (
       a INTEGER NOT NULL PRIMARY KEY,
       b INTEGER,
       c INTEGER);

     CREATE TABLE tb2 (
       a INTEGER NOT NULL,
       b INTEGER,
       c INTEGER,
     CONSTRAINT ref1
     FOREIGN KEY (b) REFERENCES WITH NO CHECK OPTION tb1(a));

     CREATE TABLE tb3 (
       a INTEGER NOT NULL,
       b INTEGER,
       c INTEGER);

The following EXPLAIN report shows the plan for a query on tables tb1 and tb3 when no Referential Constraints have been defined. In particular, notice the join in Step 5.

     EXPLAIN
     SELECT tb1.a, tb3.a, MAX(tb3.c)
     FROM tb1, tb3
     GROUP BY tb1.a, tb3.a
     WHERE tb1.a = tb3.b
     ORDER BY 1;

Result:

    Explanation
    -------------------------------------------------------------------
  1) First, we lock MyDB.tb3 for read on a reserved RowHash to prevent
     global deadlock.
  2) Next, we lock MyDB.tb1 for read on a reserved RowHash to prevent
     global deadlock.
  3) We lock MyDB.tb3 for read, and we lock MyDB.tb1 for read.
  4) We do an all-AMPs RETRIEVE step from MyDB.tb3 by way of an all-rows
     scan with a condition of ("NOT (MyDB.tb3.b IS NULL)") into Spool 4
     (all_amps), which is redistributed by the hash code of (MyDB.tb3.b)
     to all AMPs.  Then we do a SORT to order Spool 4 by row hash.  The
     size of Spool 4 is estimated with low confidence to be 4 rows (
     100 bytes).  The estimated time for this step is 0.03 seconds.
  5) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of a
     RowHash match scan, which is joined to MyDB.tb1 by way of a RowHash
     match scan with no residual conditions.  Spool 4 and MyDB.tb1 are
     joined using a merge join, with a join condition of (""MyDB.tb1.a =
     b"").  The result goes into Spool 3 (all_amps), which is built"
     locally on the AMPs.  The size of Spool 3 is estimated with index
     join confidence to be 4 rows (108 bytes).  The estimated time for
     this step is 0.11 seconds.
  6) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
     way of an all-rows scan , grouping by field1 ( MyDB.tb1.a
     ,MyDB.tb3.a).  Aggregate Intermediate Results are computed locally,
     then placed in Spool 5.  The size of Spool 5 is estimated with low
     confidence to be 4 rows (164 bytes).  The estimated time for this
     step is 0.08 seconds.
  7) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of
     an all-rows scan into Spool 1 (group_amps), which is built locally
     on the AMPs.  Then we do a SORT to order Spool 1 by the sort key
     in spool field1 (MyDB.tb1.a).  The size of Spool 1 is estimated
     with low confidence to be 4 rows (132 bytes).  The estimated time
     for this step is 0.08 seconds.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.29 seconds.

The following EXPLAIN shows the query plan for the same query, but this time on tables tb1 and tb2, with a Referential Constraint defined between column b of table tb2 and column a of table tb1. Notice that the Optimizer has recognized that the join in step 5 of the previous query plan is unnecessary and can be eliminated. The same query plan would be produced if tb1.a and tb2.b had an explicitly declared standard Referential Integrity constraint.

     EXPLAIN
     SELECT tb1.a, tb2.a, MAX(tb2.c)
     FROM tb1, tb2
     GROUP BY tb1.a, tb2.a
     WHERE tb1.a = tb2.b
     ORDER BY 1;

Result:

 Explanation
    -------------------------------------------------------------------
  1) First, we lock MyDB.tb2 for read on a reserved RowHash to prevent
     global deadlock.
  2) Next, we lock MyDB.tb2 for read.
  3) We do an all-AMPs SUM step to aggregate from MyDB.tb2 by way of an
     all-rows scan with a condition of ("NOT (MyDB.tb2.b IS NULL)")
     , grouping by field1 ( MyDB.tb2.b ,MyDB.tb2.a).  Aggregate
     Intermediate Results are computed locally, then placed in Spool 3.
     The size of Spool 3 is estimated with low confidence to be 4 rows
     (164 bytes).  The estimated time for this step is 0.07 seconds.
  4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
     an all-rows scan into Spool 1 (group_amps), which is built locally
     on the AMPs.  Then we do a SORT to order Spool 1 by the sort key
     in spool field1 (MyDB.tb2.b).  The size of Spool 1 is estimated
     with low confidence to be 4 rows (132 bytes).  The estimated time
     for this step is 0.08 seconds.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.15 seconds.