This example creates a simple row-compressed join index based on the following customer and orders tables.
CREATE TABLE customer ( c_custkey INTEGER, c_name CHARACTER(26) not null, c_address VARCHAR(41), c_nationkey INTEGER, c_phone CHARACTER(16), c_acctbal DECIMAL(13,2), c_mktsegment CHARACTER(21), c_comment VARCHAR(127)) PRIMARY INDEX( c_custkey ); CREATE TABLE orders ( o_orderkey INTEGER, o_date DATE FORMAT 'yyyy-mm-dd', o_status CHARACTER(1), o_custkey INTEGER, o_totalprice DECIMAL(13,2), o_orderpriority CHARACTER(21), o_clerk CHARACTER(16), o_shippriority INTEGER, o_comment VARCHAR(79)) UNIQUE PRIMARY INDEX(o_orderkey);
Assume that you frequently submit this join query on customer and orders.
SELECT o_custkey, c_name, o_status, o_date, o_comment FROM orders, customer WHERE o_custkey = c_custkey;
Without a join index, a typical execution plan for this query would redistribute the orders table into a spool file, sort the spool on o_custkey, and then doing a merge join between the spool file and the customer table.
If a join index like ord_cust_idx were defined on customer and orders, the resulting execution plan could just scan the join index.
The fixed column set in the join index definition is o_custkey and c_name (corresponding results rows highlighted in orange) , and the repeating column set in the join index definition is o_status, o_date, and o_comment, (corresponding results rows highlighted in blue).
CREATE JOIN INDEX ord_cust_idx AS SELECT (o_custkey, c_name ), ( o_status, o_date, o_comment ) FROM orders, customer WHERE o_custkey = c_custkey;
The way ord_cust_idx is defined makes it a row-compressed join index, with the fixed column set being o_custkey and c_name and the repeating column set being o_status, o_date, and o_comment.
Assume that the base tables customer and orders contain the following data.
customer | ||
c_custkey | c_name | c_address |
100 | Louis | Rancho Bernardo |
101 | Pekka | Helsinki |
102 | Felipe | San Juan |
orders | ||||
o_orderkey | o_date | o_status | o_custkey | o_comment |
5000 | 2011-10-01 | S | 102 | rush order |
5001 | 2011-10-01 | S | 100 | big order |
5002 | 2011-10-03 | D | 102 | delayed |
5003 | 2011-10-05 | U | ? | unknown customer |
5004 | 2011-10-05 | S | 100 | credit |
5005 | 2011-10-08 | P | 101 | discount |
The corresponding rows in the row-compressed join index ord_cust_idx are the following.
ord_cust_idx | |
fixed columns | repeated columns |
100 Louis | S 2011-10-01 big order 2011-10-05 credit |
101 Pekka | P 2011-10-08 discount |
102 Felipe | S 2011-10-03 rush order D 2011-10-03 delayed |
Notice that the fixed set columns are contained in the first column of ord_cust_idx and the repeated set columns are contained in the second column of ord_cust_idx , just as the join index definition had defined.
A drawback of join results generated from inner joins is that unmatched rows are not preserved. In the first part of this example, information about order 5003 is not stored in ord_cust_idx , thus limiting the types of queries that could be resolved using the index. To make join indexes applicable to as many queries as possible, you should define your join indexes using outer joins whenever possible. This enables a join index to satisfy queries with fewer join conditions than those used to generate the index.
Suppose you changed the definition of ord_cust_idx to use an outer join as the following CREATE JOIN INDEX request does.
CREATE JOIN INDEX ord_cust_idx AS SELECT (o_custkey, c_name), (o_status, o_date, o_comment) FROM orders LEFT OUTER JOIN customer ON o_custkey = c_custkey;
The resulting rows in ord_cust_idx are the following:
ord_cust_idx | |
fixed columns | repeated columns |
100 Louis | S 2011-10-01 big order S 2011-10-05 credit |
101 Pekka | P 2011-10-08 discount |
102 Felipe | S 2011-10-03 rush order D 2011-10-03 delayed |
? ? | U 2011-10-05 unknown customer |
By redefining ord_cust_idx using a left outer join, the following query could also be resolved using only the join index.
SELECT o_status, o_date, o_comment FROM orders;