Analysis of Row Partitioning Benefits
Thinking that row partitioning might be an easy way to optimize his workloads for
this situation, the DBA converts the sales table into a partitioned table partitioned
by transaction month and finds that many of his critical queries run faster with no
significant negative tradeoffs.
The DDL for the redefined sales table looks like this:
CREATE TABLE ppi_sales_table (
product_code CHARACTER(8),
sales_date DATE,
agent_id CHARACTER(8),
quantity_sold INTEGER,
product_description VARCHAR(50))
PRIMARY INDEX (product_code,sales_date,agent_id)
PARTITION BY RANGE_N(sales_date BETWEEN DATE ’2001-06-01’
AND DATE ’2003-06-31’
EACH INTERVAL ’1’ MONTH);
The following description examines each element of the workload as it relates to the
newly redefined sales table, ppi_sales_table.
The monthly deletes are faster because instead of a monthly Teradata Parallel Transporter
delete job, the DBA can now perform a simple monthly ALTER TABLE statement to do the
following things:
a
Drop the entire partition that contains the oldest data.
b
Create a few new partitions to hold data for the next few future months.
Deleting all the rows in a partition is optimized in much the same way that deleting
all the rows in a nonpartitioned primary index table is optimized. For example.
There is no need to record the individual rows in the transient journal as they are
deleted.
The rows for the month being deleted are stored contiguously on each AMP instead of
being scattered more or less evenly among all the data blocks of the table, so there
are fewer blocks to read.
Most of the deletes are full-block deletes, so the block does not need to be rewritten.
There is no need to touch any of the rows for the other months.
Dropping the oldest partition set is a nearly instantaneous operation, assuming there
are no USIs or join indexes that must be updated.
The DDL for this ALTER TABLE statement looks like this for a selected monthly update:
ALTER TABLE sales_table
MODIFY PRIMARY INDEX (product_code, sales_date, agent_id)
DROP RANGE BETWEEN DATE ’2001-06-01’
AND DATE ’2001-06-30’
ADD RANGE BETWEEN DATE ’2003-07-01’
AND DATE ’2003-07-31’
WITH DELETE;
The nightly Teradata Parallel Transporter insert job should run somewhat faster than
before. Instead of the inserted rows distributing more or less evenly among all the
data blocks of the table, they are concentrated in data blocks that correspond to
the proper month. This increases the average hits per block count, which is a key
measure of Teradata Parallel Transporter efficiency, as well as reducing the number
of blocks that must be rewritten.
The short-running, single‑AMP queries are not affected by partitioning. Because the
partitioning column is a component of the primary index, primary index access to the
table is not changed.
The largest gain is seen in the ad hoc queries comparing current month sales to a
prior month. Only two of the 25 partitions need to be read, instead of the full-table
scan required for a nonpartitioned table. This means that the number of disk reads
will be reduced by roughly 92%, with a corresponding reduction in elapsed query response
time. The 92% figure applies to the step that reads the sales table, not to the sum
of all the steps used to implement the query.
Given the stated assumptions, the other steps should take roughly the same amount
of time to complete as they did previously.
The same considerations apply to the agent analysis queries. The number of partitions
that must be read is determined by the time period specified in the query is known
in most cases to be three or fewer. Even if the analysis is over twelve full months,
there is roughly a 50% gain in reading twelve of 25 partitions for the step that reads
the sales table.
The decision support queries that analyze 24 months of sales data take roughly the
same time and resources as they did for a nonpartitioned primary index table, although
there is a marginal gain realized by reading 24 instead of 25 partitions. If the analysis
is for 24 months plus the current month (the entire table), then the resource usage
is the same as with the nonpartitioned primary index incarnation of the table because
full-table scans are not affected by partitioning the primary index.
The joins take the same amount of time. In the defined workload, the EXPLAIN reports
indicate that there are no direct joins against the sales table. Instead, spool is
always created from the sales table and the spool is then joined to the smaller tables.
In the partitioned table case, the spool might be created more efficiently, but once
created it is exactly the same as it is for the nonpartitioned table.