16.20 - Rules and Usage Notes for Multilevel Partitioning - Teradata Database - Teradata Vantage NewSQL Engine

Teradata Vantage™ SQL Data Definition Language Detailed Topics

Product
Teradata Database
Teradata Vantage NewSQL Engine
Release Number
16.20
Release Date
March 2019
Content Type
Programming Reference
Publication ID
B035-1184-162K
Language
English (United States)
The following guidelines and rules apply to various aspects of tables defined with multilevel partitioning.
  • The order of partitioning expressions in the PARTITION BY clause does not affect the ability of the system to eliminate partitions, but does affect how effective the scan of partitions is. This should not be a concern if there are many rows (the word many in this case is defined to mean the rows span multiple data blocks) in each of the internal partitions.
    Consider the following 2 cases.
    • For a table with 65,535 combined partitions (65,535 partitions for the combined partitioning expression), 6.5535 x 109 rows per AMP, 100 byte rows, and 50 KB data blocks, each combined partition spans 200 to 201 data blocks (assuming rows are distributed evenly among the partitions). In this case, skipping over internal partitions should not incur too much overhead. All, or nearly all, of the rows in the data blocks read should qualify, and at least 199 data blocks are skipped between each set of data blocks read.
    • If the table only has 6.5535 x 106 rows per AMP, each combined partition has only about 0.2 data blocks. Assuming a worst case where only every other partition at the lowest level is eliminated, every data block must be read, and a full-table scan would be more efficient. The Optimizer does not cost the difference between these 2 choices. If it can do row partition elimination, it will, even if a full-table scan would be more efficient.

      If there are 5 partitions at the lowest level, and 4 of the 5 partitions are eliminated, every data block must still be read.

      If there are 6 partitions at the lowest level, and 5 of the 6 partitions are eliminated, some data blocks might be skipped, but probably not enough to overcome the overhead incurred, and this might be less efficient than a full-table scan.

      With a large number of partitions at the lowest level as well as a large number of partitions being eliminated at the lowest level, enough additional data blocks might be skipped that the overhead incurred can be overcome, with the result possibly being more efficient than a full-table scan.

      This case is not a good use of multilevel partitioning. Instead, you should consider an alternative partitioning that either has multiple data blocks for each internal partition or the internal partition is not populated. A more useful partitioning scheme is one that defines fewer levels of partitioning, or fewer partitions per level, with the lowest level having the most partitions so there are more rows per combined partition.

      For example, if 1 level was partitioned in intervals of 1 day, changing the interval to 1 week or 1 month might be better.

      Row partition elimination at lower levels in the row partition hierarchy requires more skipping, which could cause more I/O as well as increasing the CPU path length.

      To obtain the best performance, you should place partitioning expressions that are more likely to evoke row partition elimination for common workloads at higher levels in the row partitioning hierarchy. Those partitioning expressions that are not likely to evoke row partition elimination should either be placed at lower levels in the hierarchy or eliminated altogether.

      You should also consider placing the row partitioning expression with the highest number of row partitions at the lowest level in the partition hierarchy. Recall, however, that the ordering of row partitioning expressions might not be a significant factor if there are many data blocks per combined partition.

      Note that if the number of partitions is expected to be altered for a RANGE_N partitioning expression, that partitioning expression must be specified at the highest level of the hierarchy.

      You can use the following query to find the average number of rows per combined partition.

           SELECT AVG(pc)
           FROM (SELECT COUNT(*) AS pc
                 FROM t
                 GROUP BY PARTITION) AS pt;

      You can use the following query to find the average number of data blocks per combined partition.

           USING (b FLOAT, r FLOAT)
           SELECT (:r/:b) * AVG(pc)
           FROM (SELECT COUNT(*) AS pc
                 FROM t
                 GROUP BY PARTITION) AS pt;

      where:

      Syntax element … Specifies the …
      b average block size.
      r row size.
  • Multilevel partitioning typically defines a large number of partitions for the combined partitioning expression. If there are a large number of populated partitions for the combined partitioning expression, performance of primary index access, joins, and aggregations on the primary index might be degraded.

    As a result, multilevel partitioning might be an appropriate choice when these operations are rarely performed without also obtaining significant row partition elimination. Note, however, that a large number of partitions for the combined partitioning expression reduces the number of data blocks that must be processed when a small number of partitions needs to be accessed because of row partition elimination.

  • If you attempt to insert or update a row of the table with a multilevel partitioning such that any one of the partitioning expressions for that row evaluates to null, the insert or update request (in Teradata session mode) or transaction (in ANSI session mode) aborts and the system returns an error message to the requestor.
  • If you want the table to be defined in such a way that all valid rows can be inserted and so all its rows are updatable, you must construct each partitioning expression in such a way that a row with any value or null for its partitioning columns is assigned to some partition number such that the RANGE_N or CASE_N function does not return null.

    You can use the NO RANGE [OR UNKNOWN] and UNKNOWN options and asterisks in the RANGE_N function and NO CASE [OR UNKNOWN] and UNKNOWN options in the CASE_N function to assist in constructing such partitioning expressions.

    Note that you should not use these options to construct partitioning expressions that permit the inclusion of rows that should not be in the table. By having partitions defined for them, such rows can have a negative effect on query performance.

    Use of these options and asterisks can also prevent the effective use of the ALTER TABLE statement to alter the partitioning of the table. See ALTER TABLE (Basic Table Parameters).

  • Expression evaluation errors such as divide by zero can occur during evaluation of the partitioning expressions, causing ANSI mode requests or Teradata mode transactions to be rolled back.

    You should construct your partitioning expressions to avoid such errors.

  • Any excess partitions are reserved for level 1 and cannot be added to any other level. That is, the maximum number of partitions for level 1 for 2-byte partitioning is the following.


    The maximum number of partitions for level 1 for 8-byte partitioning is the following.



    where:

    Equation element … Specifies
    the floor, or lower truncated value, of the expression.
    d i the number of partitions defined at level i.

    For some cases, level 1 can be altered to have between 2 and the maximum number of partitions.