- The timestamp partitioning scheme helps with inserting new transaction activity in the same way as the previous scenario (Scenario 1).
- Little performance gain is realized for row deletion because the delete operations are not performed strictly by call date. Therefore, the deleted rows are not clustered in a single partition.
- The analysis queries based on call_start benefit from this partitioning scheme, especially those that specify a range of a few days.
- This partitioning scheme does not improve the performance of Teradata Parallel Transporter bulk inserts or deletes because the inserted rows are scattered across all partitions.
- The scheme does not facilitate date-based queries.
- The scheme allows queries that specify a telephone number to run much faster, because only one partition out of 500 or more must be read to access the rows having that number.
- The scheme benefits geographic area analysis in North America, because the first three digits of North American telephone numbers uniquely identify a narrowly defined geographic region.
If 1,000 partitions improve performance, then defining 10,000 partitions using the first four digits of the telephone number typically improves performance more. If 10,000 partitions are good, then 50,000 partitions may be better.
The DDL for the redefined call_detail table looks like this:
CREATE TABLE ppi_call_detail ( phone_number DECIMAL(10) NOT NULL, call_start TIMESTAMP, call_duration INTEGER, call_description VARCHAR(30)) PRIMARY INDEX (phone_number, call_start) PARTITION BY RANGE_N(phone_number/100000 (INTEGER) BETWEEN 0 AND 99999 EACH 2);
If mapping a geographic area to one or more partitions does not solve an application problem, another potential solution is maximizing the number of partitions by using telephone number modulo 65,535 as the partitioning expression.
Assuming the cardinality of the table is 3.276 billion rows, the average partition contains 50,000 rows with this scheme. If the system has 100 AMPS, each AMP contains 500 rows per partition, a number that fits into one data block if the row width is narrow. The decrease in response time of a single-partition scan for all activity for a telephone number is dramatic compared to the full-table scan required for a nonpartitioned table.
A query to return activity for one telephone number from this table is a best case scenario for single-table response time improvement by using row partitioning. Disregarding the overhead cost of initiating the query and returning the answer set, the elapsed time may be reduced to 1/65535 of the time using a nonpartitioned table. Including the query initiation and termination overhead, the total query time improvement is less than a factor of 65,535, but may be less than 1/1,000 of the nonpartitioned primary index time. The DDL for the ppi_call_detail table using this scheme is as follows.
CREATE TABLE ppi_call_detail ( phone_number DECIMAL(10)NOT NULL, call_start TIMESTAMP, call_duration INTEGER, other_columns CHARACTER(30)) PRIMARY INDEX (phone_number, call_start) PARTITION BY phone_number MOD 65535 +1;
The workload mix determines which proposed partitioning scheme is best. You can begin your analysis with the extended logical data model, but often you must test the proposed scenarios.