n-Way Join Example - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

The tables in this example are on the Block File System.

This example uses the following schema. The large fact table, store_sales_cpa, is a columnar table (PARTITION BY (COLUMN...)).

CREATE MULTISET TABLE TPCDS.store_sales_cpa ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     DATABLOCKSIZE = 262144 BYTES, CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO,
     MAP = TD_MAP1
     (
      ss_sold_date_sk INTEGER,
      ss_sold_time_sk INTEGER,
      ss_item_sk INTEGER NOT NULL,
      ss_customer_sk INTEGER,
      ss_cdemo_sk INTEGER,
      ss_hdemo_sk INTEGER,
      ss_addr_sk INTEGER,
      ss_store_sk INTEGER,
      ss_promo_sk INTEGER,
      ss_ticket_number INTEGER NOT NULL,
      ss_quantity INTEGER,
      ss_wholesale_cost DECIMAL(7,2),
      ss_list_price DECIMAL(7,2),
      ss_sales_price DECIMAL(7,2),
      ss_ext_discount_amt DECIMAL(7,2),
      ss_ext_sales_price DECIMAL(7,2),
      ss_ext_wholesale_cost DECIMAL(7,2),
      ss_ext_list_price DECIMAL(7,2),
      ss_ext_tax DECIMAL(7,2),
      ss_coupon_amt DECIMAL(7,2),
      ss_net_paid DECIMAL(7,2),
      ss_net_paid_inc_tax DECIMAL(7,2),
      ss_net_profit DECIMAL(7,2),
CONSTRAINT ss_i FOREIGN KEY ( ss_item_sk ) REFERENCES WITH NO CHECK OPTION
 TPCDS.item ( i_item_sk ),
CONSTRAINT ss_d FOREIGN KEY ( ss_sold_date_sk ) REFERENCES WITH NO CHECK O
PTION TPCDS.date_dim ( d_date_sk ),
CONSTRAINT ss_c FOREIGN KEY ( ss_customer_sk ) REFERENCES WITH NO CHECK OP
TION TPCDS.customer ( c_customer_sk ))
PRIMARY AMP INDEX ( ss_item_sk )
PARTITION BY ( COLUMN NO AUTO COMPRESS ADD 10,RANGE_N(ss_sold_date_sk  BET
WEEN 2450800  AND 2453100  EACH 1 ,
 NO RANGE OR UNKNOWN) );

CREATE MULTISET TABLE TPCDS.date_dim ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     DATABLOCKSIZE = 262144 BYTES, CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO,
     MAP = TD_MAP1
     (
      d_date_sk INTEGER NOT NULL,
      d_date_id CHAR(16) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
      d_date DATE FORMAT 'YY/MM/DD',
      d_month_seq INTEGER,
      d_week_seq INTEGER,
      d_quarter_seq INTEGER,
      d_year INTEGER,
      d_dow INTEGER,
      d_moy INTEGER,
      d_dom INTEGER,
      d_qoy INTEGER,
      d_fy_year INTEGER,
      d_fy_quarter_seq INTEGER,
      d_fy_week_seq INTEGER,
      d_day_name CHAR(9) CHARACTER SET LATIN NOT CASESPECIFIC,
      d_quarter_name CHAR(6) CHARACTER SET LATIN NOT CASESPECIFIC,
      d_holiday CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      d_weekend CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      d_following_holiday CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      d_first_dom INTEGER,
      d_last_dom INTEGER,
      d_same_day_ly INTEGER,
      d_same_day_lq INTEGER,
      d_current_day CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      d_current_week CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      d_current_month CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      d_current_quarter CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      d_current_year CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( d_date_sk );

CREATE MULTISET TABLE TPCDS.store ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     DATABLOCKSIZE = 262144 BYTES, CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO,
     MAP = TD_MAP1
     (
      s_store_sk INTEGER NOT NULL,
      s_store_id CHAR(16) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
      s_rec_start_date DATE FORMAT 'YY/MM/DD',
      s_rec_end_date DATE FORMAT 'YY/MM/DD',
      s_closed_date_sk INTEGER,
      s_store_name VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      s_number_employees INTEGER,
      s_floor_space INTEGER,
      s_hours CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC,
      s_manager VARCHAR(40) CHARACTER SET LATIN NOT CASESPECIFIC,
      s_market_id INTEGER,
      s_geography_class VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,

      s_market_desc VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,
      s_market_manager VARCHAR(40) CHARACTER SET LATIN NOT CASESPECIFIC,
      s_division_id INTEGER,
      s_division_name VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      s_company_id INTEGER,
      s_company_name VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      s_street_number VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
      s_street_name VARCHAR(60) CHARACTER SET LATIN NOT CASESPECIFIC,
      s_street_type CHAR(15) CHARACTER SET LATIN NOT CASESPECIFIC,
      s_suite_number CHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
      s_city VARCHAR(60) CHARACTER SET LATIN NOT CASESPECIFIC,
      s_county VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
      s_state CHAR(2) CHARACTER SET LATIN NOT CASESPECIFIC,
      s_zip CHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
      s_country VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC,
      s_gmt_offset DECIMAL(5,2),
      s_tax_precentage DECIMAL(5,2))
PRIMARY INDEX ( s_store_sk );

CREATE MULTISET TABLE TPCDS.household_demographics ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     DATABLOCKSIZE = 262144 BYTES, CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO,
     MAP = TD_MAP1
     (
      hd_demo_sk INTEGER NOT NULL,
      hd_income_band_sk INTEGER,
      hd_buy_potential CHAR(15) CHARACTER SET LATIN NOT CASESPECIFIC,
      hd_dep_count INTEGER,
      hd_vehicle_count INTEGER)
PRIMARY INDEX ( hd_demo_sk );

In the following query, the dimension tables must be filtered using WHERE clauses in order for the Optimizer to consider using an n-way join optimization. Step 3 of the EXPLAIN output shows the specific phrase that indicates use of an n-way join.

EXPLAIN
SELECT ss_ticket_number,
               ss_customer_sk,
               store.s_city,
               Sum(ss_coupon_amt) amt,
               Sum(ss_net_profit) profit
        FROM   store_sales,
               date_dim,
               store,
               household_demographics
        WHERE  store_sales.ss_sold_date_sk = date_dim.d_date_sk
               AND store_sales.ss_store_sk = store.s_store_sk
               AND store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk
               AND ( household_demographics.hd_dep_count = 6
                      OR household_demographics.hd_vehicle_count > 2 )
               AND date_dim.d_dow = 1
               AND date_dim.d_year IN ( 1999, (SELECT 1999 + 1),
                                        (SELECT 1999 + 2) )
               AND store.s_number_employees BETWEEN 200 AND 295
        GROUP  BY ss_ticket_number,
                  ss_customer_sk,
                  ss_addr_sk,
                  store.s_city;
 *** Help information returned. 67 rows.
 *** Total elapsed time was 1 second.

Explanation
---------------------------------------------------------------------------
 This request is eligible for incremental planning and execution (IPE).
 The following is the static plan for the request.
  1) First, we lock tpcds.store_sales_cpa in view store_sales in
     TD_MAP1 for access, we lock TPCDS.store in TD_MAP1 for access, we
     lock TPCDS.date_dim in TD_MAP1 for access, and we lock
     TPCDS.household_demographics in TD_MAP1 for access.
  2) Next, we execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step in TD_MAP1 from
          TPCDS.date_dim by way of an all-rows scan with a condition of
          ("(TPCDS.date_dim.d_dow = 1) AND ((TPCDS.date_dim.d_year >=
          1999) AND (TPCDS.date_dim.d_year <= 2001 ))") into Spool 3
          (all_amps), which is duplicated on all AMPs in TD_Map1.  Then
          we do a SORT to partition by rowkey.  The size of Spool 3 is
          estimated with high confidence to be 11,304 rows (192,168
          bytes).  The estimated time for this step is 0.01 seconds.
       2) We do an all-AMPs RETRIEVE step in TD_MAP1 from
          TPCDS.household_demographics by way of an all-rows scan with
          a condition of (
          "(TPCDS.household_demographics.hd_vehicle_count >= 3) OR
          (TPCDS.household_demographics.hd_dep_count = 6)") into Spool
          4 (all_amps), which is duplicated on all AMPs in TD_Map1.
          The size of Spool 4 is estimated with low confidence to be
          224,640 rows (3,818,880 bytes).  The estimated time for this
          step is 0.01 seconds.
       3) We do an all-AMPs RETRIEVE step in TD_MAP1 from TPCDS.store
          by way of an all-rows scan with a condition of (
          "(TPCDS.store.s_number_employees <= 295) AND
          (TPCDS.store.s_number_employees >= 200)") into Spool 5
          (all_amps), which is duplicated on all AMPs in TD_Map1.  The
          size of Spool 5 is estimated with high confidence to be
          90,792 rows (3,540,888 bytes).  The estimated time for this
          step is 0.01 seconds.
  3) We do an all-AMPs JOIN step in TD_Map1 from 3 left sources, which
     is joined to 20718 combined partitions (9 column partitions) of
     tpcds.store_sales_cpa in view store_sales using a 4-way dynamic
     hash join.  The left sources are:
        a) Spool 3 (Last Use) by way of an all-rows scan, with a join
           condition of ("ss_sold_date_sk =
           tpcds.store_sales_cpa.d_date_sk") enhanced by dynamic
           partition elimination.
        b) Spool 4 (Last Use) by way of an all-rows scan, with a join
           condition of ("ss_hdemo_sk = hd_demo_sk").
        c) Spool 5 (Last Use) by way of an all-rows scan, with a join
           condition of ("ss_store_sk = s_store_sk").
     The result goes into Spool 2 (all_amps), which is built
     locally on the AMPs.  The result spool file is not
     cached in memory.  The size of Spool 2 is estimated with low
     confidence to be 280,646,263 rows (15,996,836,991 bytes).
     The estimated time for this step is 24.38 seconds.
  4) We do an all-AMPs SUM step in TD_Map1 to aggregate from Spool 2 by
     way of an all-rows scan, grouping by field1 (ss_ticket_number
     ,ss_customer_sk ,ss_addr_sk ,s_city).  Aggregate Intermediate
     Results are computed globally, then placed in Spool 7 in TD_Map1.
     The aggregate spool file is not cached in memory.  The size
     of Spool 7 is estimated with low confidence to be 280,646,263 rows
     (38,448,538,031 bytes).  The estimated time for this step is 4
     minutes and 44 seconds.
  5) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 7 by way of
     an all-rows scan into Spool 1 (group_amps), which is built locally
     on the AMPs.  The result spool file is not cached in memory.
     The size of Spool 1 is estimated with low confidence to be
     280,646,263 rows (33,958,197,823 bytes).  The estimated time for
     this step is 22.41 seconds.
  6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 5 minutes and 31 seconds.