Thinking that row partitioning may be an easy way to optimize workloads for this situation, the DBA converts the sales table into a partitioned table partitioned by transaction month and finds that critical queries run faster with no significant negative tradeoffs.
The DDL for the redefined sales table looks like this:
CREATE TABLE ppi_sales_table ( product_code CHARACTER(8), sales_date DATE, agent_id CHARACTER(8), quantity_sold INTEGER, product_description VARCHAR(50)) PRIMARY INDEX (product_code,sales_date,agent_id) PARTITION BY RANGE_N(sales_date BETWEEN DATE '2001-06-01' AND DATE '2003-06-31' EACH INTERVAL '1' MONTH);
The following description examines each element of the workload with respect to the newly redefined sales table, ppi_sales_table.
- The monthly deletes are faster because instead of a monthly Teradata Parallel Transporter delete job, the DBA can now perform a simple monthly ALTER TABLE statement to do the following things:
- Drop the entire partition that contains the oldest data.
- Create new partitions to hold data for future months.
Deleting all the rows in a partition is optimized in much the same way that deleting all the rows in a nonpartitioned primary index table is optimized. For example:
- There is no need to record the rows being deleted in the transient journal.
- The rows for the month being deleted are stored contiguously on each AMP instead of being scattered more or less evenly among all the data blocks of the table, so there are fewer blocks to read.
- Most of the deletes are full-block deletes, so the block need not be rewritten.
- There is no need to touch any of the rows for the other months.
Dropping the oldest partition set is instantaneous, assuming there are no USIs or join indexes that must be updated.
The DDL for this ALTER TABLE statement looks like this for a selected monthly update:
ALTER TABLE sales_table MODIFY PRIMARY INDEX (product_code, sales_date, agent_id) DROP RANGE BETWEEN DATE '2001-06-01' AND DATE '2001-06-30' ADD RANGE BETWEEN DATE '2003-07-01' AND DATE '2003-07-31' WITH DELETE;
- The nightly Teradata Parallel Transporter insert job runs faster. The inserted rows are concentrated in data blocks that correspond to the proper month, instead of distributing more or less evenly among all the data blocks of the table. This increases the average hits per block count, which is a key measure of Teradata Parallel Transporter efficiency and reducing the number of blocks that must be rewritten.
- The short-running, single-AMP queries are not affected by partitioning. Because the partitioning column is a component of the primary index, primary index access to the table is not changed.
- The largest gain is seen in the ad hoc queries comparing current month sales to a prior month. Only two of the 25 partitions need to be read, instead of the full-table scan required for a nonpartitioned table. Therefore, the number of disk reads is reduced by roughly 92%, with a corresponding reduction in elapsed query response time. The 92% figure applies to the step that reads the sales table, not to the sum of all the steps used to implement the query.
Given the stated assumptions, the time that the other steps take to complete does not change significantly.
- The same considerations apply to the agent analysis queries. The number of partitions that must be read is determined by the time period specified in the query is known in most cases to be three or fewer. Even if the analysis is over twelve full months, there is roughly a 50% gain in reading twelve of 25 partitions for the step that reads the sales table.
- The decision support queries that analyze 24 months of sales data take approximately the same time and resources as for a nonpartitioned primary index table, with a marginal gain realized by reading 24 instead of 25 partitions. If the analysis is for 24 months plus the current month (the entire table), then the resource usage is the same as with the nonpartitioned primary index incarnation of the table because full-table scans are not affected by partitioning the primary index.
- The joins take the same amount of time. In the defined workload, the EXPLAIN reports indicate that there are no direct joins against the sales table. Instead, spool is created from the sales table and the spool is then joined to the smaller tables. In the partitioned table case, the spool may be created more efficiently, but is otherwise the same as for the nonpartitioned table.