Analysis of Partitioning Benefits - Teradata VantageCloud Lake

Lake - Database Reference

Deployment
VantageCloud
Edition
Lake
Product
Teradata VantageCloud Lake
Release Number
Published
February 2025
ft:locale
en-US
ft:lastEdition
2025-11-21
dita:mapPath
ohi1683672393549.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
ohi1683672393549
There are two candidate schemes for partitioning this table. The first partitioning scheme is to partition the rows by call_start, defining one partition for each day. The results of this partitioning scheme for this table and query workload look like this.
  • The timestamp partitioning scheme helps with inserting new transaction activity in the same way as the previous scenario (Scenario 1).
  • Little performance gain is realized for row deletion because the delete operations are not performed strictly by call date. Therefore, the deleted rows are not clustered in a single partition.
  • The analysis queries based on call_start benefit from this partitioning scheme, especially those that specify a range of a few days.
The second possibility for partitioning this table is to use the telephone number. Telephone numbers have too many digits to assign each number its own partition, but a subset of the digits can be used to optimize telephone number-based row access. For example, if the table is partitioned on the high-order three digits of the telephone number, there are 1,000 partitions, including empty ones, because of the way telephone companies assign numbers. The results of this partitioning scheme for this table and query workload look like this:
  • This partitioning scheme does not improve the performance of Teradata Parallel Transporter bulk inserts or deletes because the inserted rows are scattered across all partitions.
  • The scheme does not facilitate date-based queries.
  • The scheme allows queries that specify a telephone number to run much faster, because only one partition out of 500 or more must be read to access the rows having that number.
  • The scheme benefits geographic area analysis in North America, because the first three digits of North American telephone numbers uniquely identify a narrowly defined geographic region.

If 1,000 partitions improve performance, then defining 10,000 partitions using the first four digits of the telephone number typically improves performance more. If 10,000 partitions are good, then 50,000 partitions may be better.

The DDL for the redefined call_detail table looks like this:

     CREATE TABLE ppi_call_detail (
       phone_number     DECIMAL(10) NOT NULL,
       call_start       TIMESTAMP,
       call_duration    INTEGER,
       call_description VARCHAR(30))
     PRIMARY INDEX (phone_number, call_start)
     PARTITION BY RANGE_N(phone_number/100000 (INTEGER) BETWEEN 0
                                                        AND 99999
                                                        EACH    2);

If mapping a geographic area to one or more partitions does not solve an application problem, another potential solution is maximizing the number of partitions by using telephone number modulo 65,535 as the partitioning expression.

Assuming the cardinality of the table is 3.276 billion rows, the average partition contains 50,000 rows with this scheme. If the system has 100 AMPS, each AMP contains 500 rows per partition, a number that fits into one data block if the row width is narrow. The decrease in response time of a single-partition scan for all activity for a telephone number is dramatic compared to the full-table scan required for a nonpartitioned table.

A query to return activity for one telephone number from this table is a best case scenario for single-table response time improvement by using row partitioning. Disregarding the overhead cost of initiating the query and returning the answer set, the elapsed time may be reduced to  1/65535 of the time using a nonpartitioned table. Including the query initiation and termination overhead, the total query time improvement is less than a factor of 65,535, but may be less than 1/1,000 of the nonpartitioned primary index time. The DDL for the ppi_call_detail table using this scheme is as follows.

     CREATE TABLE ppi_call_detail (
       phone_number  DECIMAL(10)NOT NULL,
       call_start    TIMESTAMP,
       call_duration INTEGER,
       other_columns CHARACTER(30))
     PRIMARY INDEX (phone_number, call_start)
     PARTITION BY phone_number MOD 65535 +1;

The workload mix determines which proposed partitioning scheme is best. You can begin your analysis with the extended logical data model, but often you must test the proposed scenarios.