15.00 - Basing a Partitioning Expression on a RANGE_N Function - Teradata Database

Teradata Database Design

Teradata Database
User Guide

Basing a Partitioning Expression on a RANGE_N Function

The RANGE_N function is provided to simplify the specification of common partitioning expressions where each partition contains a range of numeric data, and is especially useful when the column contains a date or timestamp. RANGE_N returns an INTEGER determined by the first range that includes the column value, numbered from 1, that can be mapped directly to a partition number or can be further modified to calculate the partition number. RANGE_N is commonly used to define partitioning expressions. When you use a RANGE_N function to define a single‑level partitioning expression, two partition numbers, NO RANGE and UNKNOWN, are reserved for specific uses.

Assume a table with 7 years of order data, ranging from 2001 through 2007. The next partitioning expression creates 84 partitions, one for each month of the period covered by the data:

     PRIMARY INDEX (order_number)
     PARTITION BY RANGE_N (order_date BETWEEN DATE '2001-01-01' 
                                      AND     DATE '2007-12-31'
                                      EACH INTERVAL '1' MONTH);

Each partition contains roughly the same number of rows, assuming that order volume has stayed roughly constant across the seven year interval. Neither a NO RANGE nor an UNKNOWN partition is defined because this partitioning expression definition is for data that only has dates within the ranges specified in the partitioning expression.

It is frequently desirable to have each partition contain roughly the same number of rows, but it is not required. The next example puts the older orders into partitions with coarser granularity, and the newer orders into partitions with finer granularity:

     PRIMARY INDEX (order_number)
     PARTITION BY RANGE_N (order_date BETWEEN DATE '1994-01-01' 
                                      AND     DATE '1997-12-31' 
                                      EACH INTERVAL '2' YEAR,
                                              DATE '1998-01-01' 
                                      AND     DATE '2000-12-31' 
                                      EACH INTERVAL '1' YEAR,
                                              DATE '2001-01-01' 
                                      AND     DATE '2003-12-31' 
                                      EACH INTERVAL '6' MONTH,   
                                              DATE '2004-01-01' 
                                      AND     DATE '2007-12-31' 
                                              EACH INTERVAL '1' MONTH,       
                                              NO RANGE, UNKNOWN);

In this example, the more recent data is partitioned more finely than the older data. This can be a good strategy if you know that the older data is rarely accessed except as part of a full‑table scan, because it reduces some potential disadvantages by defining a smaller number of partitions. However, maintaining this structure over extended epochs of time is not as simple as maintaining a structure in which each interval covers the same time duration. In this example, the years 2004 through 2007 are partitioned by month. As time passes, and 2004 data becomes older and less frequently referenced, it shall become necessary to repartition the table if the pattern of defining longer time intervals for older data is to be maintained.

It is both easy and fast to add and drop partitions from the ends of a partitioning definition, but repartitioning intervals in the middle partitions requires much more work, and it is usually faster to reload the data. In addition, when a range partition is dropped, any rows in that partition are moved from the dropped range partition to the NO RANGE partition or to an added range partition. Also, when a range is added, rows might need to be moved from the NO RANGE partition to the new range partition.

Some expansion room is allowed for future dates by specifying the final partition as extending to the end of 2007. It is easy to add and drop ranges to the end of a partitioning expression using the ALTER TABLE statement. While you could have specified an ending date far in the future, such as 2099‑12‑31, it is generally not desirable to define hundreds of partitions that shall not be used for decades.

The example shows intervals of years and months. It is also possible to partition by day (EACH INTERVAL '1' DAY) or by week (EACH INTERVAL '7' DAY). A seven-day interval can start on any day of the week, so if you want to start the weekly intervals on Sunday, for example, the beginning date should be chosen so that it falls on a Sunday. Also, the last range might be less than specified by the EACH clause. For example, suppose that with seven-day intervals the first date falls on Sunday, but the last date also falls on Sunday, in which case the last range spans only one day.

The RANGE_N NO RANGE clause is comparable to the CASE_N NO CASE clause, and the UNKNOWN clause has the same meaning for both functions.

As with the CASE_N partitioning expression, the Optimizer knows how many partitions are defined when RANGE_N is used as the partitioning expression.

The following example revisits the division_number example, this time using a RANGE_N function:

     PRIMARY INDEX (invoice_number)
     PARTITION BY RANGE_N (division_number BETWEEN 1 
                                           AND     4 
                                           EACH    1);

This alternate partitioning expression allows you to specify only four partitions instead of 65,535. The Optimizer might be able to choose certain join plans and more accurately cost those plans when the maximum number of partitions is both known and small, making this a better choice than using the column directly. This RANGE_N partitioning expression also prevents rows with non‑valid division numbers from being inserted into the table.