The schema used for this example follows. Note that the large fact table, store_sales_cpa, is a columnar table (PARTITION BY (COLUMN...)).
CREATE MULTISET TABLE TPCDS.store_sales_cpa ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, DATABLOCKSIZE = 262144 BYTES, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO, MAP = TD_MAP1 ( ss_sold_date_sk INTEGER, ss_sold_time_sk INTEGER, ss_item_sk INTEGER NOT NULL, ss_customer_sk INTEGER, ss_cdemo_sk INTEGER, ss_hdemo_sk INTEGER, ss_addr_sk INTEGER, ss_store_sk INTEGER, ss_promo_sk INTEGER, ss_ticket_number INTEGER NOT NULL, ss_quantity INTEGER, ss_wholesale_cost DECIMAL(7,2), ss_list_price DECIMAL(7,2), ss_sales_price DECIMAL(7,2), ss_ext_discount_amt DECIMAL(7,2), ss_ext_sales_price DECIMAL(7,2), ss_ext_wholesale_cost DECIMAL(7,2), ss_ext_list_price DECIMAL(7,2), ss_ext_tax DECIMAL(7,2), ss_coupon_amt DECIMAL(7,2), ss_net_paid DECIMAL(7,2), ss_net_paid_inc_tax DECIMAL(7,2), ss_net_profit DECIMAL(7,2), CONSTRAINT ss_i FOREIGN KEY ( ss_item_sk ) REFERENCES WITH NO CHECK OPTION TPCDS.item ( i_item_sk ), CONSTRAINT ss_d FOREIGN KEY ( ss_sold_date_sk ) REFERENCES WITH NO CHECK O PTION TPCDS.date_dim ( d_date_sk ), CONSTRAINT ss_c FOREIGN KEY ( ss_customer_sk ) REFERENCES WITH NO CHECK OP TION TPCDS.customer ( c_customer_sk )) PRIMARY AMP INDEX ( ss_item_sk ) PARTITION BY ( COLUMN NO AUTO COMPRESS ADD 10,RANGE_N(ss_sold_date_sk BET WEEN 2450800 AND 2453100 EACH 1 , NO RANGE OR UNKNOWN) ); CREATE MULTISET TABLE TPCDS.date_dim ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, DATABLOCKSIZE = 262144 BYTES, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO, MAP = TD_MAP1 ( d_date_sk INTEGER NOT NULL, d_date_id CHAR(16) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL, d_date DATE FORMAT 'YY/MM/DD', d_month_seq INTEGER, d_week_seq INTEGER, d_quarter_seq INTEGER, d_year INTEGER, d_dow INTEGER, d_moy INTEGER, d_dom INTEGER, d_qoy INTEGER, d_fy_year INTEGER, d_fy_quarter_seq INTEGER, d_fy_week_seq INTEGER, d_day_name CHAR(9) CHARACTER SET LATIN NOT CASESPECIFIC, d_quarter_name CHAR(6) CHARACTER SET LATIN NOT CASESPECIFIC, d_holiday CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC, d_weekend CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC, d_following_holiday CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC, d_first_dom INTEGER, d_last_dom INTEGER, d_same_day_ly INTEGER, d_same_day_lq INTEGER, d_current_day CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC, d_current_week CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC, d_current_month CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC, d_current_quarter CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC, d_current_year CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC) PRIMARY INDEX ( d_date_sk ); CREATE MULTISET TABLE TPCDS.store ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, DATABLOCKSIZE = 262144 BYTES, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO, MAP = TD_MAP1 ( s_store_sk INTEGER NOT NULL, s_store_id CHAR(16) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL, s_rec_start_date DATE FORMAT 'YY/MM/DD', s_rec_end_date DATE FORMAT 'YY/MM/DD', s_closed_date_sk INTEGER, s_store_name VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC, s_number_employees INTEGER, s_floor_space INTEGER, s_hours CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC, s_manager VARCHAR(40) CHARACTER SET LATIN NOT CASESPECIFIC, s_market_id INTEGER, s_geography_class VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC, s_market_desc VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC, s_market_manager VARCHAR(40) CHARACTER SET LATIN NOT CASESPECIFIC, s_division_id INTEGER, s_division_name VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC, s_company_id INTEGER, s_company_name VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC, s_street_number VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC, s_street_name VARCHAR(60) CHARACTER SET LATIN NOT CASESPECIFIC, s_street_type CHAR(15) CHARACTER SET LATIN NOT CASESPECIFIC, s_suite_number CHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC, s_city VARCHAR(60) CHARACTER SET LATIN NOT CASESPECIFIC, s_county VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC, s_state CHAR(2) CHARACTER SET LATIN NOT CASESPECIFIC, s_zip CHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC, s_country VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC, s_gmt_offset DECIMAL(5,2), s_tax_precentage DECIMAL(5,2)) PRIMARY INDEX ( s_store_sk ); CREATE MULTISET TABLE TPCDS.household_demographics ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, DATABLOCKSIZE = 262144 BYTES, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO, MAP = TD_MAP1 ( hd_demo_sk INTEGER NOT NULL, hd_income_band_sk INTEGER, hd_buy_potential CHAR(15) CHARACTER SET LATIN NOT CASESPECIFIC, hd_dep_count INTEGER, hd_vehicle_count INTEGER) PRIMARY INDEX ( hd_demo_sk );
In the following query, note that the dimension tables must be filtered using WHERE clauses in order for the Optimizer to consider using an n-way join optimization. Step 3 of the EXPLAIN output shows the specific phrase that indicates use of an n-way join.
EXPLAIN SELECT ss_ticket_number, ss_customer_sk, store.s_city, Sum(ss_coupon_amt) amt, Sum(ss_net_profit) profit FROM store_sales, date_dim, store, household_demographics WHERE store_sales.ss_sold_date_sk = date_dim.d_date_sk AND store_sales.ss_store_sk = store.s_store_sk AND store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk AND ( household_demographics.hd_dep_count = 6 OR household_demographics.hd_vehicle_count > 2 ) AND date_dim.d_dow = 1 AND date_dim.d_year IN ( 1999, (SELECT 1999 + 1), (SELECT 1999 + 2) ) AND store.s_number_employees BETWEEN 200 AND 295 GROUP BY ss_ticket_number, ss_customer_sk, ss_addr_sk, store.s_city;
Result:
*** Help information returned. 67 rows.
*** Total elapsed time was 1 second.
Explanation
---------------------------------------------------------------------------
This request is eligible for incremental planning and execution (IPE).
The following is the static plan for the request.
1) First, we lock tpcds.store_sales_cpa in view store_sales in
TD_MAP1 for access, we lock TPCDS.store in TD_MAP1 for access, we
lock TPCDS.date_dim in TD_MAP1 for access, and we lock
TPCDS.household_demographics in TD_MAP1 for access.
2) Next, we execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step in TD_MAP1 from
TPCDS.date_dim by way of an all-rows scan with a condition of
("(TPCDS.date_dim.d_dow = 1) AND ((TPCDS.date_dim.d_year >=
1999) AND (TPCDS.date_dim.d_year <= 2001 ))") into Spool 3
(all_amps), which is duplicated on all AMPs in TD_Map1. Then
we do a SORT to partition by rowkey. The size of Spool 3 is
estimated with high confidence to be 11,304 rows (192,168
bytes). The estimated time for this step is 0.01 seconds.
2) We do an all-AMPs RETRIEVE step in TD_MAP1 from
TPCDS.household_demographics by way of an all-rows scan with
a condition of (
"(TPCDS.household_demographics.hd_vehicle_count >= 3) OR
(TPCDS.household_demographics.hd_dep_count = 6)") into Spool
4 (all_amps), which is duplicated on all AMPs in TD_Map1.
The size of Spool 4 is estimated with low confidence to be
224,640 rows (3,818,880 bytes). The estimated time for this
step is 0.01 seconds.
3) We do an all-AMPs RETRIEVE step in TD_MAP1 from TPCDS.store
by way of an all-rows scan with a condition of (
"(TPCDS.store.s_number_employees <= 295) AND
(TPCDS.store.s_number_employees >= 200)") into Spool 5
(all_amps), which is duplicated on all AMPs in TD_Map1. The
size of Spool 5 is estimated with high confidence to be
90,792 rows (3,540,888 bytes). The estimated time for this
step is 0.01 seconds.
3) We do an all-AMPs JOIN step in TD_Map1 from 3 left sources, which
is joined to 20718 combined partitions (9 column partitions) of
tpcds.store_sales_cpa in view store_sales using a 4-way dynamic
hash join. The left sources are:
a) Spool 3 (Last Use) by way of an all-rows scan, with a join
condition of ("ss_sold_date_sk =
tpcds.store_sales_cpa.d_date_sk") enhanced by dynamic
partition elimination.
b) Spool 4 (Last Use) by way of an all-rows scan, with a join
condition of ("ss_hdemo_sk = hd_demo_sk").
c) Spool 5 (Last Use) by way of an all-rows scan, with a join
condition of ("ss_store_sk = s_store_sk").
The result goes into Spool 2 (all_amps), which is built
locally on the AMPs. The result spool file will not be
cached in memory. The size of Spool 2 is estimated with low
confidence to be 280,646,263 rows (15,996,836,991 bytes).
The estimated time for this step is 24.38 seconds.
4) We do an all-AMPs SUM step in TD_Map1 to aggregate from Spool 2 by
way of an all-rows scan, grouping by field1 (ss_ticket_number
,ss_customer_sk ,ss_addr_sk ,s_city). Aggregate Intermediate
Results are computed globally, then placed in Spool 7 in TD_Map1.
The aggregate spool file will not be cached in memory. The size
of Spool 7 is estimated with low confidence to be 280,646,263 rows
(38,448,538,031 bytes). The estimated time for this step is 4
minutes and 44 seconds.
5) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 7 by way of
an all-rows scan into Spool 1 (group_amps), which is built locally
on the AMPs. The result spool file will not be cached in memory.
The size of Spool 1 is estimated with low confidence to be
280,646,263 rows (33,958,197,823 bytes). The estimated time for
this step is 22.41 seconds.
6) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 5 minutes and 31 seconds.