16.10 - Scenario 4 - Teradata Database

Teradata Database Design

Product
Teradata Database
Release Number
16.10
Release Date
June 2017
Content Type
User Guide
Publication ID
B035-1094-161K
Language
English (United States)

Dealing With Ambiguous Scenarios

The previous scenarios illustrate situations where a row-partitioned table is the obvious choice to enhance the performance of a query workload. This scenario examines a more ambiguous situation in which there are more tradeoff considerations and it is not possible to determine in advance one correct solution for all specific instances of the scenario.

Table Structure and Update Schedule

An invoice table contains data about each invoice issued in the past four years. The unique primary index is invoice number. New rows are added nightly, using Teradata Parallel Transporter Update Operator, and the oldest month of data is deleted once each month.

Query Workload

A moderately heavy volume of queries requests information about one specified invoice. There are also ad hoc analysis queries that examine all invoices for some period of time, which is usually less than one year. Other tables have invoice number as their primary index, but do not have an invoice date column, so there are frequent joins with those other tables.

Problem Statement

The DBA is considering whether it would be advantageous to partition the invoice table on invoice date using one-month ranges.

The primary index is currently defined as unique, but would have to be redefined as nonunique if the table were row-partitioned. There is a business requirement to guarantee that invoice numbers are unique, so the DBA would need to define a uniqueness constraint on the invoice number column. If this uniqueness constraint is added, it creates an additional secondary index on the table (other than UPIs, all uniqueness constraints are implemented internally as USIs irrespective of whether they are specified explicitly as a UNIQUE constraint, a PRIMARY KEY constraint, or a USI constraint. See Using Unique Secondary Indexes to Enforce Row Uniqueness), which increases processing on insert, delete, and update operations, as well as requiring additional disk capacity to store the resulting secondary index subtable. The base table is also larger by two bytes per row, further increasing the required disk space.

Analysis of Partitioning Benefits

The primary index access queries that were run against the nonpartitioned version of this table must be reformulated to use the USI to access the row. As a general rule, accessing a row takes roughly two to three times longer using a USI than it would using a UPI. However, UPI access is a very fast operation, so doubling or tripling the time might barely be noticeable to the users who issue those queries.

Without row partition elimination, direct Merge Joins require, at best, more memory and CPU utilization and might be measurably slower compared to a similar nonpartitioned table. The extent of performance degradation depends on the query conditions, how many partitions can be excluded, and the specific join plan chosen by the Optimizer. Actual measurement of representative queries is necessary to determine the overall difference in performance.

The nightly inserts benefit in the same way, and for the same reasons, as in Scenario 1. However, the additional index on invoice number partially offsets that benefit. The same considerations apply to the monthly delete operations.

The ad hoc queries examining several months of invoices benefit in the same way as in Scenario 1. The benefit is greatest when fewer months are examined.

Would it be worthwhile to convert the invoice table to a partitioned table? The DBA must measure the degree of improvement as well as the extent of degradation in the various types of query, and use that analysis to determine how much each query type contributes to the overall workload involving this table. This exercise produces a good estimate of the comparative workload performance against the table with and without partitioning.

If the measured performance difference between the otherwise equivalent partitioned and nonpartitioned tables is substantial, in either direction, then the choice might appear to be obvious. However, you must also weigh the relative importance to the enterprise of the various activities in the workload.

For example, consider the following contingencies:

  • If the time required to perform the nightly volume of bulk inserts is beginning to exceed the time allotted for inserting new rows, then even a small improvement in load time might be considered sufficiently important to offset larger degradations in other aspects of the query workload.
  • Similarly, if the response time of the PI-access queries is critical, even a small performance degradation might be considered unacceptable, whether net workload performance is improved or not.

Conclusions

The decision whether to implement a table with or without partitioning is not always cut and dried, and the ultimate decision, like many others in physical database design, can often be more of an optimization than a maximization. In this scenario, careful and considered measurement, analysis, and evaluation are all required to make an optimal decision.