15.00 - Analysis of Partitioning Benefits - Teradata Database

Teradata Database Design

Teradata Database
User Guide

Analysis of Partitioning Benefits

There are two candidate schemes for partitioning this table. The first partitioning scheme is to partition the rows by call_start, probably defining one partition for each day. The results of this partitioning scheme for this table and query workload look like this.

  • The timestamp partitioning scheme helps with inserting new transaction activity in the same way as the previous scenario (“Scenario 1” on page 427).
  • Little performance gain is realized for row deletion because the delete operations are not performed strictly by call date. Because of this, the deleted rows are not clustered in a single partition.
  • The analysis queries based on call_start benefit from this partitioning scheme, with those queries that specify a range of a few days realizing the greatest gain.
  • The second possibility for partitioning this table is to use the telephone number. Telephone numbers contain too many digits to assign each number its own partition, but a subset of the digits can be used profitably to optimize telephone number‑based row access. If, for example, the table is partitioned on the high order three digits of the telephone number, there are 1,000 partitions, some of which are always be empty because of the way telephone companies assign numbers. The results of this partitioning scheme for this table and query workload look like this:

  • This partitioning scheme does not improve the performance of Teradata Parallel Transporter bulk inserts or deletes because they are scattered across all partitions.
  • The scheme does not facilitate date-based queries.
  • The scheme allows queries that specify a telephone number to run much faster than they otherwise would because only one partition out of 500 or more partitions must be read to access the rows having that number.
  • The scheme benefits geographic area analysis, at least in North America, because the first three digits of North American telephone numbers uniquely identify a narrowly defined geographic region.
  • If 1,000 partitions improve performance, then defining 10,000 partitions using the first four digits of the telephone number would probably improve performance even more. If 10,000 partitions are good, then 50,000 partitions might be better yet.

    The DDL for the redefined call_detail table looks like this:

         CREATE TABLE ppi_call_detail (
           phone_number     DECIMAL(10) NOT NULL,
           call_start       TIMESTAMP,
           call_duration    INTEGER,
           call_description VARCHAR(30))
         PRIMARY INDEX (phone_number, call_start)
         PARTITION BY RANGE_N(phone_number/100000 (INTEGER) BETWEEN 0
                                                            AND 99999
                                                            EACH    2);

    If mapping a geographic area to one or more partitions does not solve an application problem, then another potential solution is to maximize the number of partitions by using telephone number modulo 65,535 as the partitioning expression. Assuming the cardinality of the table is roughly 3.276 billion rows, then the average partition contains roughly 50,000 rows with this scheme. If the system has 100 AMPS, then each AMP contains roughly 500 rows per partition, a number that fits into one data block if the row width is fairly narrow. The decrease in response time of a single-partition scan for all activity for a particular telephone number is dramatic compared to the full-table scan that would be required for a nonpartitioned table.

    A query to return activity for one telephone number from this table is a best case scenario for single‑table response time improvement by using row partitioning. Disregarding the overhead cost of initiating the query and returning the answer set, the elapsed time could be reduced to  of the time using a nonpartitioned table. Including the query initiation and termination overhead, the total query time improvement would be somewhat less than a factor of 65,535, but could be less than of the nonpartitioned primary index time. The DDL for the ppi_call_detail table using this scheme is as follows.

         CREATE TABLE ppi_call_detail (
           phone_number  DECIMAL(10)NOT NULL,
           call_start    TIMESTAMP,
           call_duration INTEGER,
           other_columns CHARACTER(30))
         PRIMARY INDEX (phone_number, call_start)
         PARTITION BY phone_number MOD 65535 +1;

    The workload mix determines which, if any, of these proposed partitioning schemes is best. You can begin your analysis with the extended logical data model, but actual testing of the various proposed scenarios is often required.