15.00 - Analysis of Row Partitioning Benefits - Teradata Database

Teradata Database Design

prodname
Teradata Database
vrm_release
15.00
category
User Guide
featnum
B035-1094-015K

Analysis of Row Partitioning Benefits

Thinking that row partitioning might be an easy way to optimize his workloads for this situation, the DBA converts the sales table into a partitioned table partitioned by transaction month and finds that many of his critical queries run faster with no significant negative tradeoffs.

The DDL for the redefined sales table looks like this:

    CREATE TABLE ppi_sales_table (
     product_code        CHARACTER(8),
     sales_date          DATE,
     agent_id            CHARACTER(8),
     quantity_sold       INTEGER,
     product_description VARCHAR(50))
    PRIMARY INDEX (product_code,sales_date,agent_id)
    PARTITION BY RANGE_N(sales_date BETWEEN DATE ’2001-06-01’
                                    AND     DATE ’2003-06-31’
                                    EACH INTERVAL ’1’ MONTH);

The following description examines each element of the workload as it relates to the newly redefined sales table, ppi_sales_table.

  • The monthly deletes are faster because instead of a monthly Teradata Parallel Transporter delete job, the DBA can now perform a simple monthly ALTER TABLE statement to do the following things:
  • a Drop the entire partition that contains the oldest data.

    b Create a few new partitions to hold data for the next few future months.

    Deleting all the rows in a partition is optimized in much the same way that deleting all the rows in a nonpartitioned primary index table is optimized. For example.

  • There is no need to record the individual rows in the transient journal as they are deleted.
  • The rows for the month being deleted are stored contiguously on each AMP instead of being scattered more or less evenly among all the data blocks of the table, so there are fewer blocks to read.
  • Most of the deletes are full-block deletes, so the block does not need to be rewritten.
  • There is no need to touch any of the rows for the other months.
  • Dropping the oldest partition set is a nearly instantaneous operation, assuming there are no USIs or join indexes that must be updated.

    The DDL for this ALTER TABLE statement looks like this for a selected monthly update:

         ALTER TABLE sales_table
         MODIFY PRIMARY INDEX (product_code, sales_date, agent_id)
         DROP RANGE BETWEEN DATE ’2001-06-01’
                    AND     DATE ’2001-06-30’
         ADD RANGE BETWEEN DATE ’2003-07-01’
                   AND     DATE ’2003-07-31’
         WITH DELETE;
  • The nightly Teradata Parallel Transporter insert job should run somewhat faster than before. Instead of the inserted rows distributing more or less evenly among all the data blocks of the table, they are concentrated in data blocks that correspond to the proper month. This increases the average hits per block count, which is a key measure of Teradata Parallel Transporter efficiency, as well as reducing the number of blocks that must be rewritten.
  • The short-running, single‑AMP queries are not affected by partitioning. Because the partitioning column is a component of the primary index, primary index access to the table is not changed.
  • The largest gain is seen in the ad hoc queries comparing current month sales to a prior month. Only two of the 25 partitions need to be read, instead of the full-table scan required for a nonpartitioned table. This means that the number of disk reads will be reduced by roughly 92%, with a corresponding reduction in elapsed query response time. The 92% figure applies to the step that reads the sales table, not to the sum of all the steps used to implement the query.
  • Given the stated assumptions, the other steps should take roughly the same amount of time to complete as they did previously.

  • The same considerations apply to the agent analysis queries. The number of partitions that must be read is determined by the time period specified in the query is known in most cases to be three or fewer. Even if the analysis is over twelve full months, there is roughly a 50% gain in reading twelve of 25 partitions for the step that reads the sales table.
  • The decision support queries that analyze 24 months of sales data take roughly the same time and resources as they did for a nonpartitioned primary index table, although there is a marginal gain realized by reading 24 instead of 25 partitions. If the analysis is for 24 months plus the current month (the entire table), then the resource usage is the same as with the nonpartitioned primary index incarnation of the table because full-table scans are not affected by partitioning the primary index.
  • The joins take the same amount of time. In the defined workload, the EXPLAIN reports indicate that there are no direct joins against the sales table. Instead, a spool file is always created from the sales table and the spool file is then joined to the smaller tables. In the partitioned table case, the spool file might be created more efficiently, but once created it is exactly the same as it is for the nonpartitioned table.