16.10 - EXPLAIN Request Modifier and Partitioned Primary Index Access - Teradata Database

Teradata Database SQL Request and Transaction Processing

prodname
Teradata Database
vrm_release
16.10
created_date
June 2017
category
Programming Reference
User Guide
featnum
B035-1142-161K

EXPLAIN reports indicate partition accesses, deletions, joins, and eliminations performed during query optimization.

Partitioned Primary Index Examples

This set of examples shows some of the more important EXPLAIN text phrases associated with optimizing queries made on tables that have partitioned primary indexes.

The following example demonstrates an EXPLAIN report for accessing a subset of row partitions for a SELECT request. The relevant phrase is highlighted in boldface type.

     CREATE TABLE t1 (
       a INTEGER,
       b INTEGER)
     PRIMARY INDEX(a) PARTITION BY RANGE_N(b BETWEEN 1
                                             AND    10
                                             EACH    1);
   
     EXPLAIN SELECT *
             FROM t1
             WHERE b > 2;
  1) First, we lock a distinct mws."pseudo table" for read on a RowHash
     to prevent global deadlock for mws.t1.
  2) Next, we lock mws.t1 for read.
  3) We do an all-AMPs RETRIEVE step from  8 partitions of  mws.t1
     and with a condition of ("mws.t1.b > 2") into Spool 1
     (group_amps), which is built locally on the AMPs. The size of
     Spool 1 is estimated with no confidence to be 1 row. The
     estimated time for this step is 0.14 seconds.
  4) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1. The total estimated time is 0.14 seconds.

The following example demonstrates an EXPLAIN report for a SELECT request with an equality constraint on the partitioning column. The relevant phrase is highlighted in boldface type. The report indicates that all rows in a single row partition are scanned across all AMPs.

     CREATE TABLE t1 (
       a INTEGER,
       b INTEGER)
     PRIMARY INDEX(a) PARTITION BY RANGE_N(b BETWEEN 1
                                             AND    10
                                             EACH    1);
     EXPLAIN SELECT *
             FROM t1
             WHERE t1.b = 1;
1) First, we lock mws.t1 for read on a reserved RowHash in a single
     partition to prevent global deadlock.
2) Next, we lock mws.t1 for read on a single partition.
3) We do an all-AMPs RETRIEVE step from  a single partition of  mws.t1
   with a condition of ("mws.t1.b = 1") into Spool 1
   (group_amps), which is built locally on the AMPs. The size of
   Spool 1 is estimated with no confidence to be 2 rows. The
   estimated time for this step is 0.15 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
   in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
   statement 1. The total estimated time is 0.15 seconds.

The following example demonstrates an EXPLAIN request report for row-partitioned primary index access without any constraints on the partitioning column. The relevant phrase is in boldface type. The report indicates that all row partitions are accessed by way of the primary index on a single AMP.

     CREATE TABLE t1 (
       a INTEGER,
       b INTEGER)
     PRIMARY INDEX(a)
     PARTITION BY RANGE_N(b BETWEEN 1
                            AND    10
                            EACH 1);
     EXPLAIN SELECT *
             FROM t1
             WHERE t1.a = 1;
1) First, we do a single-AMP RETRIEVE step from  all partitions 
      of  mws2.t1 by way of the primary index "mws2.t1.a = 1" with no
     residual conditions into Spool 1 (one_amp), which is built
     locally on that AMP. The size of Spool 1 is estimated with low
     confidence to be 2 rows. The estimated time for this step is 0.15
     seconds.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1. The total estimated time is 0.15 seconds.

The following example demonstrates the processing of a SELECT request without any row partition elimination. The phrase n partitions of does not occur in the report.

     CREATE TABLE t1 (
       a INTEGER,
       b INTEGER)
     PRIMARY INDEX(a)
     PARTITION BY RANGE_N(b BETWEEN 1
                            AND    10
                            EACH    1);
     EXPLAIN SELECT *
             FROM t1
             WHERE b>-1;
  1) First, we lock a distinct mws."pseudo table" for read on a RowHash
     to prevent global deadlock for mws.t1.
  2) Next, we lock mws.t1 for read.
  3) We do an all-AMPs RETRIEVE step from mws.t1 by way of an all-rows
     scan with a condition of ("mws.t1.b > -1") into Spool 1
     (group_amps), which is built locally on the AMPs. The size of
     Spool 1 is estimated with no confidence to be 1 row. The
     estimated time for this step is 0.15 seconds.
  4) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1. The total estimated time is 0.15 seconds.

Two steps are generated to perform the partial and full row partition deletions, respectively, as demonstrated in the following EXPLAIN reports. The relevant phrases appear in boldface type.

     CREATE TABLE t2 (
        a INTEGER,
        b INTEGER)
     PRIMARY INDEX(a)
     PARTITION BY RANGE_N(b BETWEEN 1 
                            AND    10
                            EACH 2);
     EXPLAIN DELETE
             FROM t2
             WHERE b BETWEEN 4 AND 7;
  1) First, we lock mws.t2 for write on a reserved RowHash in all
     partitions to prevent global deadlock.
  2) Next, we lock mws.t2 for write.
  3) We do an all-AMPs DELETE from  2 partitions of  mws.t2
     with a condition of ("(mws.t2.b <= 7) AND (mws.t2.b >= 4)").
  4) We do an all-AMPs DELETE  of a single partition  from mws.t2 with a
     condition of ("(mws.t2.b <= 7) AND (mws.t2.b >= 4)").
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> No rows are returned to the user as the result of statement 1.
     EXPLAIN DELETE
             FROM t2
             WHERE b BETWEEN 4 AND 8;
   
  1) First, we lock mws.t2 for write on a reserved RowHash in all
     partitions to prevent global deadlock.
  2) Next, we lock mws.t2 for write.
  3) We do an all-AMPs DELETE from  a single partition of  mws.t2
     with a condition of ("(mws.t2.b <= 8) AND (mws.t2.b >= 4)").
  4) We do an all-AMPs DELETE  of 2 partitions  from mws.t2 with a
     condition of ("(mws.t2.b <= 8) AND (mws.t2.b >= 4)").
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> No rows are returned to the user as the result of statement 1.

The following example demonstrates a spool with a row-partitioned primary index and a rowkey-based merge join. The relevant phrases appear in boldface type.

     CREATE TABLE t3 (
        a INTEGER,
        b INTEGER)
     PRIMARY INDEX(a);
   
     CREATE TABLE t4 (
        a INTEGER,
        b INTEGER)
     PRIMARY INDEX(a)
     PARTITION BY b;
     EXPLAIN SELECT *
             FROM t3, t4
             WHERE t3.a = t4.a
             AND   t3.b = t4.b;
    
  1) First, we lock mws.t2 for write on a reserved RowHash in all
     partitions to prevent global deadlock.
  2) Next, we lock a distinct mws."pseudo table" for read on a RowHash
     to prevent global deadlock for mws.t4.
  3) We lock mws.t3 for read, and we lock mws.t4 for read.
  4) We do an all-AMPs RETRIEVE step from mws.t3 by way of an all-rows
     scan with a condition of ("(NOT (mws.t3.b IS NULL )) AND (NOT
     (mws.t3.a IS NULL ))") into Spool 2 (all_amps), which is built
     locally on the AMPs. Then we do an all-AMPs  SORT to partition 
      Spool 2 by rowkey. The size of Spool 2 is estimated with no
     confidence to be 2 rows. The estimated time for this step is
     0.03 seconds.
  5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
     RowHash match scan, which is joined to mws.t4 with a condition of
     ("NOT (mws.t4.a IS NULL)"). Spool 2 and mws.t4 are joined using
      a rowkey-based  merge join, with a join condition of
     ("(a = mws.t4.a) AND (b = mws.t4.b)"). The input table mws.t4 will
     not be cached in memory. The result goes into Spool 1 (all_amps),
     which is built locally on the AMPs. The size of Spool 1 is 
     estimated with no confidence to be 1 row. The estimated time for
     this step is 0.20 seconds.
  6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1. The total estimated time is 0.22 seconds.

The following example demonstrates one step for joining two tables having the same partitioning and primary keys. The relevant phrases appear in boldface type.

     CREATE TABLE orders (
       o_orderkey      INTEGER NOT NULL,
       o_custkey       INTEGER,
       o_orderstatus   CHARACTER(1) CASESPECIFIC,
       o_totalprice    DECIMAL(13,2) NOT NULL,
       o_orderdate     DATE FORMAT 'yyyy-mm-dd' NOT NULL,
       o_orderpriority CHARACTER(21),
       o_clerk         CHARACTER(16),
       o_shippriority  INTEGER,
       o_comment       VARCHAR(79))
     PRIMARY INDEX (o_orderkey)
     PARTITION BY RANGE_N(o_orderdate BETWEEN DATE '1992-01-01'
                                      AND     DATE '1998-12-31'
                                      EACH INTERVAL '1' MONTH)
     UNIQUE INDEX (o_orderkey);
     CREATE TABLE lineitem (
       l_orderkey      INTEGER NOT NULL,
       l_partkey       INTEGER NOT NULL,
       l_suppkey       INTEGER,
       l_linenumber    INTEGER,
       l_quantity      INTEGER NOT NULL,
       l_extendedprice DECIMAL(13,2) NOT NULL,
       l_discount      DECIMAL(13,2),
       l_tax           DECIMAL(13,2),
       l_returnflag    CHARACTER(1),
       l_linestatus    CHARACTER(1),
       l_shipdate      DATE FORMAT 'yyyy-mm-dd',
       l_commitdate    DATE FORMAT 'yyyy-mm-dd',
       l_receiptdate   DATE FORMAT 'yyyy-mm-dd',
       l_shipinstruct  VARCHAR(25),
       l_shipmode      VARCHAR(10),
       l_comment       VARCHAR(44))
     PRIMARY INDEX (l_orderkey)
     PARTITION BY RANGE_N(l_shipdate BETWEEN DATE '1992-01-01'
                                     AND     DATE '1998-12-31'
                                     EACH INTERVAL '1' MONTH);
     EXPLAIN SELECT *
             FROM lineitem, ordertbl
             WHERE  l_orderkey = o_orderkey
             AND    l_shipdate = o_orderdate
             AND   (o_orderdate < DATE '1993-10-01')
             AND   (o_orderdate >= DATE '1993-07-01')
             ORDER BY o_orderdate, l_orderkey;
…
…
3) We do an all-AMPs JOIN step from  3 partitions of  TH.ORDERTBL
   with a condition of (
   "(TH.ORDERTBL.O_ORDERDATE < DATE '1993-10-01') AND
   (TH.ORDERTBL.O_ORDERDATE >= DATE '1993-07-01')"),
   which is joined to TH.LINEITEM with a condition of (
   "TH.LINEITEM.L_COMMITDATE < TH.LINEITEM.L_RECEIPTDATE").
   TH.ORDERTBL and TH.LINEITEM are joined using
   a rowkey-based  inclusion merge join, with a join condition of (
   "TH.LINEITEM.L_ORDERKEY = TH.ORDERTBL.O_ORDERKEY").
   The input tables TH.ORDERTBL and TH.LINEITEM will
   not be cached in memory. The result goes into Spool 3 (all_amps),
   which is built locally on the AMPs. The size of Spool 3 is
   estimated with no confidence to be 7,739,047 rows. The
   estimated time for this step is 1 hour and 34 minutes.
…
…

The following example demonstrates row partition elimination in an aggregation. The relevant phrase appears in boldface type.

     CREATE TABLE t1 (
       a INTEGER,
       b INTEGER)
     PRIMARY INDEX(a) PARTITION BY RANGE_N(b BETWEEN 1
                                             AND    10
                                             EACH 1);
     EXPLAIN SELECT MAX(a)
             FROM t1
             WHERE b > 3;
Explanation
-----------------------------------------------------------------------
  1) First, we lock a distinct mws."pseudo table" for read on a RowHash
     to prevent global deadlock for mws.t1.
  2) Next, we lock mws.t1 for read.
  3) We do a SUM step to aggregate from  7 partitions of  mws.t1
     with a condition of ("mws.t1.b > 3"). Aggregate
     Intermediate Results are computed globally, then placed in Spool
     3. The input table will not be cached in memory, but it is
     eligible for synchronized scanning. The size of Spool 3 is
     estimated with high confidence to be 1 row. The estimated time
     for this step is 2.35 seconds.
  4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
     an all-rows scan into Spool 1 (group_amps), which is built locally
     on the AMPs. The size of Spool 1 is estimated with high
     confidence to be 1 row. The estimated time for this step is 0.17
     seconds.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.

New terminology in these explanations are defined as follows:

Phrase Definition
n partitions of Only n of the row partitions are accessed, where n > 1. In this case, n = 8.
a single partition of Only one row partition is accessed in processing this request.
all partitions of All row partitions are accessed for primary index access in processing this request.
of a single partition The Optimizer determined that all rows in a single row partition can be deleted. In some cases, this allows for faster deletion of the entire partition.
of n partitions The Optimizer determined that all rows in each of n row partitions can be deleted, where n > 1. In some cases, this allows for faster deletion of entire row partitions.
SORT to partition Spool n by rowkey. The Optimizer determined that a spool is to be row-partitioned based on the same partitioning expression as a table to which the spool is to be joined.

That is, the spool is to be sorted by rowkey (partition and hash). Partitioning the spool in this way enables a faster join with the row-partitioned table.

n is the spool number.

a rowkey-based The join is hash-based by row partition (rowkey). In this case, there are equality constraints on both the partitioning and primary index columns. This enables a faster join since each uneliminated row partition needs to be joined with at most only one other row partition.

When this phrase is not reported, then the join is hash-based. That is, there are equality constraints on the primary index columns from which the hash is derived. For a row-partitioned table, there is additional overhead incurred by processing the table in hash order.

Note that with either method, the join conditions must still be validated.