15.00 - Workload Characteristics and Row Partitioning - Teradata Database

Teradata Database Design

prodname
Teradata Database
vrm_release
15.00
category
User Guide
featnum
B035-1094-015K

Workload Characteristics and Row Partitioning

Consider using row partitioning when your workloads have any of the following characteristics:

  • The number of queries in workloads that access the table have a range constraint, particularly a date constraint on some column of the table.
  • Queries have an equality constraint on some column of the table, and that column is either not the only primary index column or it is not a primary index column at all.
  • If there is a primary index that is used only, or principally, to achieve an even distribution of rows, but not usually for accessing or joining rows, and access is frequently made on a column that is suitable for partitioning.
  • If there is a primary index that is used to achieve an even distribution of rows as well as for accessing or joining rows, and columns suitable for partitioning are included in the primary index definition.
  • If there is a primary index that is used to achieve an even distribution of rows as well as for accessing or joining rows, but columns suitable for partitioning are not included in the primary index definition.
  • This might be a good candidate for partitioning, but you need to pay particular attention to weighing the performance tradeoffs that often result in this situation.

  • Use the RANGE_N function for a partitioning expression, preferably on a column with a DATE or TIMESTAMP data type, because it generally provides more opportunities for row partition elimination, and the Optimizer knows the exact number of defined row partitions.
  • For instance, instead of the following row partitioning expression.

         PARTITION BY column

    use

         PARTITION BY RANGE_N(column BETWEEN m 
                                     AND     n 
                                     EACH    s)

    Dates and timestamps are often used in query conditions and therefore makes good candidates for a partitioning expression.

    When partitioning on a DATE column, use RANGE_N with a single overall range divided into ranges of equal size as follows.

         PARTITION BY RANGE_N(date_column BETWEEN DATE '...' 
                                          AND     DATE '...' 
                                          EACH INTERVAL 's' t)
  • Use DATE constants or TIMESTAMP constants such as DATE '2011-08-06' or TIMESTAMP '2011-08-25 10:14:59' to specify the ranges in a partitioning expression. This is not only easier to read, making it clear that the expression is defined on a date, but it also removes the dependence on the FORMAT used in the implicit conversion to a date. You can also specify TIMESTAMP(n) WITH TIME ZONE constants for RANGE_N partitioning.
  • For example, you can specify a RANGE_N‑based partitioning expression like the following.

         RANGE_N(ts BETWEEN TIMESTAMP '2003-01-01 00:00:00+13:00'
                    AND     TIMESTAMP '2009-12-31 23:59:59-12:59' 
                    EACH INTERVAL '1' MONTH)

    Use an INTERVAL constant in the EACH clause where the variable t is DAY, MONTH, YEAR, or YEAR TO MONTH.

    Do not use INTEGER values or CHARACTER constants for dates in your partitioning expressions, since they can easily be incorrectly specified such that they do convert to the date you expected.

    For example, it might seem intuitive to simply partition by the name of a DATE column as follows:

         PARTITION BY sales_date

    This form does not produce a syntax error. In fact, it works correctly only for dates in the early 1900s and follows the rule of implicit conversion to get an INTEGER partition number, but such a table is not generally useful.

    For this case, you should instead use RANGE_N with a granularity of EACH INTERVAL '1' DAY.

    It might also seem intuitive to specify something like the following to indicate that a date column in the primary index is to be partitioned by week:

         PARTITION BY 7

    However, this form does produce a syntax error.

    For this case, use RANGE_N with a granularity of EACH INTERVAL '7' DAY.

  • Consider specifying only as many date ranges as are currently needed plus a few additional ranges for the future.
  • By limiting ranges to those that are currently needed, you help the Optimizer to better cost plans and also allow for more efficient primary index access, joins, and aggregations when the partitioning column is not included in the primary index.

    This is not as important if you collect current PARTITION statistics.

    A good guideline is to define 10% or fewer of the partitions to be empty to be able to handle future dates. You should also define enough future ranges to minimize the frequency of ALTER TABLE statements needed to drop and add ranges. However, if you make changes too infrequently, you might forget to alter the table entirely.

    Altering the table only once a year is probably not a good idea for the following reasons:

  • Because you must create too many empty partitions.
  • Because it is too easy to forget to alter the partitioning ranges if you do not do it fairly regularly.
  • Because you fail to follow the procedure often enough to prevent problems from occurring when you finally get around to following it.
  • You must balance these concerns of having enough future partitions, but not too many.

  • RANGE_N permits a faster partitioning change using ALTER TABLE … DROP RANGES or … ADD RANGES.
  • Reference a single INTEGER or DATE column in the RANGE_N function.
  • Do not use an expression such as in place of a simple column reference in a partitioning expression constructed from a RANGE_N function even if the expression only references a single column.

  • Define ranges of equal size using EACH to specify the granularity of the partition.
  • Multiple ranges, with or without specifying an EACH granularity, require more CPU time to execute and different sized ranges can prevent fast partitioning changes.

  • Consider not specifying the NO RANGE, NO RANGE OR UNKNOWN, or UNKNOWN partitions in the RANGE_N function.
  • Using these partitions can degrade query performance because queries can be forced to scan data in these partitions unnecessarily. If you specify the NO RANGE, NO RANGE OR UNKNOWN, or UNKNOWN partitions, you can also affect the performance of ALTER TABLE partitioning change negatively because data might need to be moved among the partitions. An UNKNOWN partition is not needed if the partitioning cannot produce rows with unknown partition values (this is often the case when a partitioning column is specified to be NOT NULL).

  • Deciding not to use a RANGE_N function for a partitioning expression can be a good choice in the following instances:
  • You do not partition the table or join index on a DATE column.
  • You use constant or USING variable equality conditions on the partitioning columns in a majority of the queries in your workloads to specify a single partition.
  • If the assumption made by the Optimizer that the table (not based on a CASE_N partitioning expression) has 65,535 partitions provides good plans in these cases, you do not need to alter partitioning, and the system produces the plans and data maintenance performance your site requires.