RANGE_N
Purpose
Evaluates an expression and maps the result into one of a list of specified ranges and returns the position of the range in the list.
Syntax
where:
Syntax element … |
Specifies … |
test_expression |
an expression that results in a BYTEINT, SMALLINT, INTEGER, DATE, CHAR, VARCHAR, GRAPHIC or VARGRAPHIC data type. |
start_expression |
a literal or literal expression that defines the starting boundary of a range. The data type of start_expression must be the same as the data type of test_expression, or must be such that it can be implicitly cast to the same data type as test_expression. For a starting boundary without an ending boundary, the range is defined by its starting boundary (and the starting boundary is included in the range), up to but not including the starting boundary of the next range. Use an asterisk ( * ) for the starting boundary of the first range in the list to indicate the lowest possible value (all values and NULL are greater than a starting boundary specified as an asterisk). An asterisk is compatible with any data type. |
end_expression |
a literal or literal expression that defines the ending boundary of a range. The data type of end_expression must be the same as the data type of test_expression, or must be such that it can be implicitly cast to the same data type as test_expression. The last range in the list must specify an ending boundary. For all other ranges, if an ending boundary is not specified, the range is defined by its starting boundary (and the starting boundary is included in the range), up to but not including the starting boundary of the next range. Use an asterisk ( * ) for the ending boundary of the last range in the list to indicate the highest possible value (all values and NULL are less than an ending boundary specified as an asterisk). |
EACH range_size |
a literal or literal expression with a value greater than zero. A range that specifies an EACH phrase is equivalent to a series of ranges, where the first range in the series starts at start_expression, and subsequent ranges start at start_expression + (range_size * n), where n starts at one and increments by one while start_expression + (range_size * n) is less than or equal to end_expression, or less than the next start_expression in the list of ranges. For DATE types, the calculation of valid dates in subsequent ranges uses ADD_MONTHS instead of the + arithmetic operator. For more information on ADD_MONTHS, see “ADD_MONTHS” on page 856. The data type of range_size must be compatible for adding to test_expression. Note: If the data type of test_expression is a character type (CHAR, VARCHAR, GRAPHIC or VARGRAPHIC), you cannot specify the EACH phrase. |
NO RANGE |
an optional range to handle a test_expression that does not map into any of the specified ranges. |
OR UNKNOWN |
an option to use with NO RANGE. The NO RANGE OR UNKNOWN option handles a test_expression that does not map into any of the specified ranges, or a test_expression that evaluates to NULL when RANGE_N does not specify the range BETWEEN * AND *. |
UNKNOWN |
an option to handle a test_expression that evaluates to NULL when RANGE_N does not specify the range BETWEEN * AND *. |
ANSI Compliance
This is a Teradata extension to the ANSI SQL:2011 standard.
Range Definition
A range is defined by a starting boundary and an optional ending boundary. If an ending boundary is not specified, the range is defined by its starting boundary, inclusively, up to but not including the starting boundary of the next range.
The list of ranges must specify ranges in increasing order, where the ending boundary of a range is less than the starting boundary of the next range.
Evaluation
RANGE_N evaluates test_expression and determines whether the result is within a range in the list of ranges. The position of the first range is one and the positions of subsequent ranges increment by one up to n, where n is the total number of ranges.
IF … |
THEN … |
||||||
the result of test_expression is within a range |
RANGE_N returns the position of the range.
|
||||||
the result of test_expression is NULL |
|
||||||
If RANGE_N does not specify one of the following: RANGE_N returns NULL. If RANGE_N specifies the range BETWEEN * AND *, RANGE_N returns 1, regardless of whether NO RANGE, NO RANGE OR UNKNOWN, or UNKNOWN is specified. If RANGE_N does not specify the range BETWEEN * AND * and |
|||||||
test_expression is outside all the ranges in the list |
If NO RANGE or NO RANGE OR UNKNOWN is specified, RANGE_N returns n + 1. If neither NO RANGE nor NO RANGE OR UNKNOWN is specified, RANGE_N returns NULL. |
Result Type and Attributes
The data type, format, and title for RANGE_N are as follows.
Data Type |
Format |
Title |
INTEGER |
Default format of the INTEGER data type |
<RANGE_N function> |
For information on default data type formats, see SQL Data Types and Literals.
Using RANGE_N to Define Partitioned Primary Indexes
The primary index for a table or join index controls the distribution of the data for that table or join index across the AMPs, as well as its retrieval. If the primary index is a partitioned primary index (PPI), the data can be assigned to user-defined partitions on the AMPs.
To define a primary index for a table or join index, you specify the PRIMARY INDEX phrase in the CREATE TABLE or CREATE JOIN INDEX data definition statement. To define a partitioned primary index, you include the PARTITION BY phrase when you define the primary index.
The PARTITION BY phrase requires one or more partitioning expressions that determine the partition assignment of a row. You can use RANGE_N to construct a partitioning expression such that a row with any value or NULL for the partitioning columns is assigned to some partition.
You can also use CASE_N to construct a partitioning expression. For more information, see “CASE_N” on page 124.
If the PARTITION BY phrase specifies a list of partitioning expressions, the PPI is a multilevel PPI, where each partition for a level is subpartitioned according to the next partitioning expression in the list. Unlike the partitioning expression for a single-level PPI, which can consist of any valid SQL expression (with some exceptions), each expression in the list of partitioning expressions for a multilevel PPI must be a CASE_N or RANGE_N function.
Using RANGE_N with CURRENT_DATE or CURRENT_TIMESTAMP in a PPI
You can define a partitioning expression that uses RANGE_N with the built-in functions CURRENT_DATE or CURRENT_TIMESTAMP. Use of CURRENT_DATE or CURRENT_TIMESTAMP in a partitioning expression is most appropriate when the data must be partitioned as one or more current partitions and one or more history partitions where the current and history partitions are based on the resolved CURRENT_DATE or CURRENT_TIMESTAMP in the partitioning expression. This allows you to periodically reconcile the table to move older data from the current partition into one or more history partitions using the ALTER TABLE TO CURRENT statement instead of redefining the partitioning using explicit dates which must be determined each time the ALTER TABLE DROP/ADD RANGE is done.
For more information, see “Rules and Guidelines for Optimizing the Reconciliation of RANGE_N PPI Expressions Based On Updatable Current Date and Updatable Current Timestamp” in SQL Data Definition Language Detailed Topics.
Using RANGE_N with Character Data
You can specify character expressions (CHAR, VARCHAR, GRAPHIC or VARGRAPHIC) as the test_expression and/or the range boundaries in a RANGE_N function. The following usage rules apply:
Note: All character string comparisons involving graphic data are case specific.
Note that the pad character might not collate to the lowest code point in the collation. For a range boundary of length n, if the test_expression precisely matches that range boundary for the first n characters, but contains a character that collates less than the pad character at position n+1, then the test_expression will collate less than the range boundary. See “Example 10” on page 165.
Restrictions
If RANGE_N appears in a PARTITION BY phrase, it:
If RANGE_N is used in a partitioning expression for a multilevel PPI, it must define at least two partitions.
If RANGE_N specifies CURRENT_DATE or CURRENT_TIMESTAMP in a partitioning expression, you cannot use ALTER TABLE to add or drop ranges for the table. You must use the ALTER TABLE TO CURRENT statement to achieve this function.
Using a UDT as the Test Expression
The test_expression should not be an expression that results in a UDT data type. An error is reported if this occurs when RANGE_N is used to define a PPI. If RANGE_N is not used to define a PPI, you should explicitly cast the expression so that it is BYTEINT, SMALLINT, INTEGER, DATE, CHAR, VARCHAR, GRAPHIC or VARGRAPHIC instead of depending upon any implicit conversions.
Example
Here is an example that uses RANGE_N and the value of the totalorders column to define the partition to which a row is assigned:
CREATE TABLE orders
(storeid INTEGER NOT NULL
,productid INTEGER NOT NULL
,orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL
,totalorders INTEGER)
PRIMARY INDEX (storeid, productid)
PARTITION BY RANGE_N(totalorders BETWEEN *, 100, 1000 AND *,
UNKNOWN);
In the example, RANGE_N specifies four partitions to which a row can be assigned, based on the value of the totalorders column:
Partition Number |
Condition |
1 |
The value of the totalorders column is less than 100. |
2 |
The value of the totalorders column is less than 1000, but greater than or equal to 100. |
3 |
The value of the totalorders column is greater than or equal to 1000. |
4 |
The totalorders column is NULL, so the range is UNKNOWN. |
Example
Here is an example that modifies “Example 1” to use RANGE_N in a list of partitioning expressions that define a multilevel PPI:
CREATE TABLE orders
(storeid INTEGER NOT NULL
,productid INTEGER NOT NULL
,orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL
,totalorders INTEGER NOT NULL)
PRIMARY INDEX (storeid, productid)
PARTITION BY (RANGE_N(totalorders BETWEEN *, 100, 1000 AND *)
,RANGE_N(orderdate BETWEEN *, '2005-12-31' AND *) );
The example defines six partitions to which a row can be assigned. The first RANGE_N expression defines three partitions based on the value of the totalorders column. The second RANGE_N expression subdivides each of the three partitions into two partitions based on the value of the orderdate column.
Level 1 Partition Number |
Level 2 Partition Number |
Condition |
1
|
1 |
The value of the totalorders column is less than 100 and the value of the orderdate column is less than '2005‑12‑31'. |
2 |
The value of the totalorders column is less than 100 and the value of the orderdate column is greater than or equal to '2005‑12‑31'. |
|
2
|
1 |
The value of the totalorders column is less than 1000 but greater than or equal to 100, and the value of the orderdate column is less than '2005‑12‑31'. |
2 |
The value of the totalorders column is less than 1000 but greater than or equal to 100, and the value of the orderdate column is greater than or equal to '2005‑12‑31'. |
|
3
|
1 |
The value of the totalorders column is greater than or equal to 1000 and the value of the orderdate column is less than '2005‑12‑31'. |
2 |
The value of the totalorders column is greater than or equal to 1000 and the value of the orderdate column is greater than or equal to '2005‑12‑31'. |
Example
Here is an example that defines a partitioned primary index that specifies one partition to which rows are assigned, for any value of the totalorders column, including NULL:
CREATE TABLE orders
(storeid INTEGER NOT NULL
,productid INTEGER NOT NULL
,orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL
,totalorders INTEGER)
PRIMARY INDEX (storeid, productid)
PARTITION BY RANGE_N(totalorders BETWEEN * AND *);
Example
The following example shows the count of rows in each partition if the table were to be partitioned using the RANGE_N expression.
CREATE TABLE orders
(orderkey INTEGER NOT NULL
,custkey INTEGER
,orderdate DATE FORMAT 'yyyy-mm-dd')
PRIMARY INDEX (orderkey);
INSERT INTO orders (1, 100, '1998-01-01');
INSERT INTO orders (2, 100, '1998-04-01');
INSERT INTO orders (3, 109, '1998-04-01');
INSERT INTO orders (4, 101, '1998-04-10');
INSERT INTO orders (5, 100, '1998-07-01');
INSERT INTO orders (6, 109, '1998-07-10');
INSERT INTO orders (7, 101, '1998-08-01');
INSERT INTO orders (8, 101, '1998-12-01');
INSERT INTO orders (9, 111, '1999-01-01');
INSERT INTO orders (10, 111, NULL);
The RANGE_N expression in the following SELECT statement uses the EACH phrase to define a series of 12 ranges, where the first range starts at '1998-01-01' and the ranges that follow have starting boundaries that increment sequentially by one month intervals.
SELECT COUNT(*),
RANGE_N(orderdate
BETWEEN DATE '1998-01-01' AND DATE '1998-12-31'
EACH INTERVAL '1' MONTH
) AS Partition_Number
FROM orders
GROUP BY Partition_Number
ORDER BY Partition_Number;
The results look like this:
Count(*) Partition_Number
----------- ----------------
2 ?
1 1
3 4
2 7
1 8
1 12
Example
The following example creates a table with partitioning defined using a RANGE_N expression involving the END bound function. The table creates 10 partitions where each partition represents the sales history for one year.
CREATE TABLE SalesHistory
(product_code CHAR (8),
quantity_sold INTEGER,
transaction_period PERIOD (DATE))
PRIMARY INDEX (product_code)
PARTITION BY
RANGE_N (END (transaction_period) BETWEEN date'2006-01-01'
AND date '2015-12-31' EACH INTERVAL'1' YEAR);
The following SELECT statement scans five partitions of the sales history before the year 2010.
SELECT *
FROM SalesHistory
WHERE transaction_period < period (date'2010-01-01');
Example : Start_expression with CURRENT_DATE
If CURRENT_DATE or CURRENT_TIMESTAMP is specified in the start_expression of the first range in RANGE_N, and if this start_expression when resolved with a new CURRENT_DATE or CURRENT_TIMESTAMP falls on a partition boundary, then all partitions prior to the partition matched are dropped. Otherwise, the entire table is re-partitioned with the new partitioning expression.
Consider the following CREATE TABLE statement submitted on April 1, 2006:
CREATE TABLE ppi (i INT, j DATE)
PRIMARY INDEX (i)
PARTITION BY
RANGE_N (j BETWEEN CURRENT_DATE AND
CURRENT_DATE + INTERVAL '1' YEAR - INTERVAL '1' DAY
EACH INTERVAL '1' MONTH);
The last resolved date is April 1, 2006. If you submit an ALTER TABLE TO CURRENT statement on June 1, 2006, the start_expression, newly resolved to CURRENT_DATE ('2006-06-01'), falls on a partition boundary of the third partition. Therefore, partitions 1 and 2 are dropped, and the last reconciled date is set to the newly resolved CURRENT_DATE.
However, if you submitted the ALTER TABLE TO CURRENT statement on June 10, 2006 instead of June 1, 2006, the start_expression, newly resolved to CURRENT_DATE ('2006-06-10'), does not fall on a partition boundary. Therefore, all rows are scanned and the rows are repartitioned based on the new partitioning expression. The partition boundary after this statement aligns with the 10th day of the month instead of the earlier 1st day of the month.
Example
The following table definition is created in the year 2007 (the current year at the time). The table is partitioned to record 5 years of order history plus orders for the current year and one future year.
CREATE TABLE Orders
(o_orderkey INTEGER NOT NULL,
o_custkey INTEGER,
o_orderstatus CHAR(1) CASESPECIFIC,
o_totalprice DECIMAL(13,2) NOT NULL,
o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL,
o_orderpriority CHAR(21),
o_comment VARCHAR(79))
PRIMARY INDEX (o_orderkey)
PARTITION BY RANGE_N(
o_orderdate BETWEEN DATE '2002-01-01' AND DATE '2008-12-31'
EACH INTERVAL '1' MONTH)
UNIQUE INDEX (o_orderkey);
If, in 2008, you want to alter the table such that it continues to maintain 5 years of history plus the current year and one future year, you can submit the following statement in 2008:
ALTER TABLE Orders MODIFY PRIMARY INDEX (o_orderkey)
DROP RANGE WHERE PARTITION BETWEEN 1 AND 12
ADD RANGE BETWEEN DATE '2009-01-01' AND DATE '2009-12-31'
EACH INTERVAL '1' MONTH
WITH DELETE;
In this case, you must compute the new dates and specify them explicitly in the ADD RANGE clause. This requires manual intervention every year the statement is submitted.
Alternatively, you can define the table using CURRENT_DATE as follows. This makes it easier to alter the partitioning.
CREATE TABLE Orders
(o_orderkey INTEGER NOT NULL,
o_custkey INTEGER,
o_orderstatus CHAR(1) CASESPECIFIC,
o_totalprice DECIMAL(13,2) NOT NULL,
o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL,
o_orderpriority CHAR(21),
o_comment VARCHAR(79))
PRIMARY INDEX (o_orderkey)
PARTITION BY RANGE_N(o_orderdate BETWEEN
CAST(((EXTRACT(YEAR FROM CURRENT_DATE)-5-1900)*10000+0101) AS DATE)
AND
CAST(((EXTRACT(YEAR FROM CURRENT_DATE)+1-1900)*10000+1231) AS DATE)
EACH INTERVAL '1' MONTH)
UNIQUE INDEX (o_orderkey);
You can schedule the following ALTER TABLE statement to occur yearly. This statement rolls the partition window forward by efficiently dropping and adding partitions.
ALTER TABLE Orders TO CURRENT WITH DELETE;
With the use of CURRENT_DATE, you do not need to modify the ALTER TABLE statement each time you want to repartition the data based on the new dates.
In both cases, the partitioning starts on a year boundary. In the first example, the ALTER TABLE statement does not change this, so partitioning continues to start on a year boundary. However, you can specify an ALTER TABLE statement that changes the partitioning to start on a different boundary. For example, you can roll forward to start on a particular month in a year by specifying the desired dates in the ALTER TABLE statement.
In the second example, which uses CURRENT_DATE, you can only roll forward to start on a year boundary. However, you can modify the example as follows so that partitioning can be used to roll forward to start at the beginning of a month. This case assumes that, as of the CREATE TABLE date, the Orders table will contain the last 71 months of history plus the current month and 12 months in the future (a total of 84 months).
CREATE TABLE Orders
(o_orderkey INTEGER NOT NULL,
o_custkey INTEGER,
o_orderstatus CHAR(1) CASESPECIFIC,
o_totalprice DECIMAL(13,2) NOT NULL,
o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL,
o_orderpriority CHAR(21),
o_comment VARCHAR(79))
PRIMARY INDEX (o_orderkey)
PARTITION BY RANGE_N(o_orderdate BETWEEN
CAST(((EXTRACT(YEAR FROM CURRENT_DATE)-1900)*10000 +
EXTRACT(MONTH FROM CURRENT_DATE)*100 + 01) AS DATE) -
INTERVAL '71' MONTH
AND
CAST(((EXTRACT(YEAR FROM CURRENT_DATE)+1-1900)*10000 +
EXTRACT(MONTH FROM CURRENT_DATE)*100 + 01) AS DATE)+
INTERVAL '13' MONTH - INTERVAL '1' DAY
EACH INTERVAL '1' MONTH)
UNIQUE INDEX (o_orderkey);
You can schedule the following ALTER TABLE statement to occur monthly or less frequently (but before running out of future months). This statement rolls the partition window forward by dropping and adding partitions so that the Orders table continues to contain the last 71 months of history plus the current month and 12 months in the future.
ALTER TABLE Orders TO CURRENT WITH DELETE;
You can define the following simpler partitioning, but it might not be optimized, and the entire table might be scanned to reconcile rows when you submit an ALTER TABLE TO CURRENT statement. This case assumes that, as of the CREATE TABLE date, the Orders table will contain about 2,191 days of history plus the current day and about 365 days in the future (a total of about 7 years).
CREATE TABLE Orders
(o_orderkey INTEGER NOT NULL,
o_custkey INTEGER,
o_orderstatus CHAR(1) CASESPECIFIC,
o_totalprice DECIMAL(13,2) NOT NULL,
o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL,
o_orderpriority CHAR(21),
o_comment VARCHAR(79))
PRIMARY INDEX (o_orderkey)
PARTITION BY RANGE_N(o_orderdate BETWEEN
CURRENT_DATE - INTERVAL '6' YEAR
AND
CURRENT_DATE + INTERVAL '1' YEAR
EACH INTERVAL '1' MONTH)
UNIQUE INDEX (o_orderkey);
You can schedule the following ALTER TABLE statement to occur daily or less frequently (but before running out of future days). This statement rolls the partition window forward by dropping and adding partitions only if the CURRENT_DATE is the same day of the month as the day when the last CREATE TABLE or ALTER TABLE TO CURRENT statement was submitted. Otherwise, the entire table is scanned to reconcile the rows.
ALTER TABLE Orders TO CURRENT WITH DELETE;
This can be very inefficient if the ALTER TABLE statement is not submitted on the same day of the month as the day when the last CREATE TABLE or ALTER TABLE TO CURRENT statement was submitted. Performance degrades as the number of days between the last resolved date and the new resolved date increases due to the increasing number of rows that must be moved.
For example, if the last resolved date was January 1, 2008, and the next ALTER TABLE TO CURRENT statement is submitted on February 2, 2008, all the rows of the table will be moved to new partitions.
Example
The following example defines 5 ranges. The session collation is ASCII.
RANGE_N(animal BETWEEN *, 'ape', 'bird', 'bull' AND 'cow',
'dog' AND *, NO RANGE, UNKNOWN)
where:
Range |
Includes... |
1 |
all values less than 'ape'. |
2 |
strings greater than or equal to 'ape' and less than 'bird'. |
3 |
strings greater than or equal to 'bird' and less than 'bull'. |
4 |
strings between 'bull' and 'cow'. |
5 |
strings greater than or equal to 'dog'. |
If the value of animal matches one of the defined ranges, RANGE_N returns the number of the matched range.
If the value of animal is greater than 'cow' but less than 'dog', it does not match any of the ranges, so RANGE_N returns 6 because NO RANGE is specified.
If the value of animal is NULL, RANGE_N returns 7 because UNKNOWN is specified.
Example
The following example defines 5 ranges. The session collation is ASCII.
RANGE_N(animal BETWEEN *, 'ape', 'bird', 'bull' AND 'cow',
'dog' AND *, UNKNOWN)
where:
Range |
Includes... |
1 |
all values less than 'ape'. |
2 |
strings greater than or equal to 'ape' and less than 'bird'. |
3 |
strings greater than or equal to 'bird' and less than 'bull'. |
4 |
strings between 'bull' and 'cow'. |
5 |
strings greater than or equal to 'dog'. |
If the value of animal matches one of the defined ranges, RANGE_N returns the number of the matched range.
If the value of animal is greater than 'cow' but less than 'dog', it does not match any of the ranges, so RANGE_N returns NULL because NO RANGE is not specified.
If the value of animal is NULL, RANGE_N returns 6 because UNKNOWN is specified.
Example
In this example, the session collation is ASCII when submitting the CREATE TABLE statement, and the pad character is <space>. The example defines two ranges (numbered 1 and 2):
CREATE SET TABLE t2
(a VARCHAR(10) CHARACTER SET UNICODE NOT CASESPECIFIC,
b INTEGER)
PRIMARY INDEX (a)
PARTITION BY RANGE_N(a BETWEEN 'a','b' AND 'c');
The following INSERT statement inserts a character string consisting of a single <tab> character between the 'b' and '1'.
INSERT t2 ('b 1', 1);
The following INSERT statement inserts a character string consisting of a single <space> character between the 'b' and '1'.
INSERT t2 ('b 1', 2);
The following SELECT statement shows the result of the INSERT statements. Since the <tab> character has a lower code point than the <space> character, the first string inserted maps to partition 1.
SELECT PARTITION, a, b FROM t2 ORDER BY 1;
*** Query completed. 2 rows found. 3 columns returned.
*** Total elapsed time was 1 second.
PARTITION a b
----------- ------ -----
1 b 1 1 (string contains single <tab> character)
2 b 1 2 (string contains single <space> character)
Related Topics
For information on … |
See … |
PPI properties and performance considerations |
Database Design. |
PPI considerations and capacity planning |
|
specifying a PPI for a table |
CREATE TABLE in SQL Data Definition Language. |
specifying a PPI for a join index |
CREATE JOIN INDEX in SQL Data Definition Language. |
modifying the partitioning of the primary index for a table |
ALTER TABLE in SQL Data Definition Language. |
the reconciliation of the partitioning based on newly resolved CURRENT_DATE and CURRENT_TIMESTAMP values |
ALTER TABLE TO CURRENT in SQL Data Definition Language |