16.20 - Examples - Teradata Vantage NewSQL Engine

Teradata Vantage™ SQL Functions, Expressions, and Predicates

prodname
Teradata Database
Teradata Vantage NewSQL Engine
vrm_release
16.20
category
Programming Reference
featnum
B035-1145-162K

Example 1: Specifying Four Partitions to which a Row Can Be Assigned

Here is an example that uses RANGE_N and the value of the totalorders column to define the partition to which a row is assigned:

   CREATE TABLE orders 
    (storeid INTEGER NOT NULL
    ,productid INTEGER NOT NULL
    ,orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL
    ,totalorders INTEGER)
    PRIMARY INDEX (storeid, productid) 
     PARTITION BY RANGE_N(totalorders BETWEEN *, 100, 1000 AND *,
                         UNKNOWN);

In the example, RANGE_N specifies four partitions to which a row can be assigned, based on the value of the totalorders column:

Partition Number Condition
1 The value of the totalorders column is less than 100.
2 The value of the totalorders column is less than 1000, but greater than or equal to 100.
3 The value of the totalorders column is greater than or equal to 1000.
4 The totalorders column is NULL, so the range is UNKNOWN.

Example 2: Using RANGE_N in a List of Partitioning Expressions

Here is an example that modifies “Examples” to use RANGE_N in a list of partitioning expressions that define a multilevel PPI:

   CREATE TABLE orders 
    (storeid INTEGER NOT NULL
    ,productid INTEGER NOT NULL
    ,orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL
    ,totalorders INTEGER NOT NULL)
    PRIMARY INDEX (storeid, productid) 
     PARTITION BY (RANGE_N(totalorders BETWEEN *, 100, 1000 AND *)
                  ,RANGE_N(orderdate BETWEEN *, '2005-12-31' AND *) );

The example defines six partitions to which a row can be assigned. The first RANGE_N expression defines three partitions based on the value of the totalorders column. The second RANGE_N expression subdivides each of the three partitions into two partitions based on the value of the orderdate column.

Level 1 Partition Number Level 2 Partition Number Condition
1 1 The value of the totalorders column is less than 100 and the value of the orderdate column is less than '2005-12-31'.
2 The value of the totalorders column is less than 100 and the value of the orderdate column is greater than or equal to '2005-12-31'.
2 1 The value of the totalorders column is less than 1000 but greater than or equal to 100, and the value of the orderdate column is less than '2005-12-31'.
2 The value of the totalorders column is less than 1000 but greater than or equal to 100, and the value of the orderdate column is greater than or equal to '2005-12-31'.
3 1 The value of the totalorders column is greater than or equal to 1000 and the value of the orderdate column is less than '2005-12-31'.
2 The value of the totalorders column is greater than or equal to 1000 and the value of the orderdate column is greater than or equal to '2005-12-31'.

Example 3: Defining a Partitioned Primary Index that Specifies One Partition

Here is an example that defines a partitioned primary index that specifies one partition to which rows are assigned, for any value of the totalorders column, including NULL:

   CREATE TABLE orders 
    (storeid INTEGER NOT NULL
    ,productid INTEGER NOT NULL
    ,orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL
    ,totalorders INTEGER)
    PRIMARY INDEX (storeid, productid) 
     PARTITION BY RANGE_N(totalorders BETWEEN * AND *);

Example 4: Counting Rows in Each Partition if the Table is Partitioned Using the RANGE_N Expression

The following example shows the count of rows in each partition if the table were to be partitioned using the RANGE_N expression.

   CREATE TABLE orders 
    (orderkey INTEGER NOT NULL
    ,custkey INTEGER
    ,orderdate DATE FORMAT 'yyyy-mm-dd')
    PRIMARY INDEX (orderkey);
   
   INSERT INTO orders (1,  100, '1998-01-01');
   INSERT INTO orders (2,  100, '1998-04-01');
   INSERT INTO orders (3,  109, '1998-04-01');
   INSERT INTO orders (4,  101, '1998-04-10');
   INSERT INTO orders (5,  100, '1998-07-01');
   INSERT INTO orders (6,  109, '1998-07-10');
   INSERT INTO orders (7,  101, '1998-08-01');
   INSERT INTO orders (8,  101, '1998-12-01');
   INSERT INTO orders (9,  111, '1999-01-01');
   INSERT INTO orders (10, 111, NULL);

The RANGE_N expression in the following SELECT statement uses the EACH phrase to define a series of 12 ranges, where the first range starts at '1998-01-01' and the ranges that follow have starting boundaries that increment sequentially by one month intervals.

   SELECT COUNT(*), 
          RANGE_N(orderdate 
                  BETWEEN DATE '1998-01-01' AND DATE '1998-12-31'
                  EACH INTERVAL '1' MONTH
          ) AS Partition_Number
   FROM orders 
   GROUP BY Partition_Number 
   ORDER BY Partition_Number;

The results look like this:

      Count(*) Partition_Number
   ----------- ----------------
             2                ?
             1                1
             3                4
             2                7
             1                8
             1               12

Example 5: Table Partitioning Using a RANGE_N Expression

The following example creates a table with partitioning defined using a RANGE_N expression involving the END bound function. The table creates 10 partitions where each partition represents the sales history for one year.

   CREATE TABLE SalesHistory
     (product_code CHAR (8),
      quantity_sold INTEGER,
      transaction_period PERIOD (DATE))
     PRIMARY INDEX (product_code)
     PARTITION BY
        RANGE_N (END (transaction_period) BETWEEN date'2006-01-01'
                 AND date '2015-12-31' EACH INTERVAL'1' YEAR);

The following SELECT statement scans five partitions of the sales history before the year 2010.

   SELECT * 
   FROM SalesHistory 
   WHERE transaction_period < period (date'2010-01-01');

Example 6: Start_expression with CURRENT_DATE

If CURRENT_DATE or CURRENT_TIMESTAMP is specified in the start_expression of the first range in RANGE_N, and if this start_expression when resolved with a new CURRENT_DATE or CURRENT_TIMESTAMP falls on a partition boundary, then all partitions prior to the partition matched are dropped. Otherwise, the entire table is re-partitioned with the new partitioning expression.

Consider the following CREATE TABLE statement submitted on April 1, 2006:

   CREATE TABLE ppi (i INT, j DATE)
   PRIMARY INDEX (i) 
   PARTITION BY 
      RANGE_N (j BETWEEN CURRENT_DATE AND
               CURRENT_DATE + INTERVAL '1' YEAR - INTERVAL '1' DAY
               EACH INTERVAL '1' MONTH);

The last resolved date is April 1, 2006. If you submit an ALTER TABLE TO CURRENT statement on June 1, 2006, the start_expression, newly resolved to CURRENT_DATE ('2006-06-01'), falls on a partition boundary of the third partition. Therefore, partitions 1 and 2 are dropped, and the last reconciled date is set to the newly resolved CURRENT_DATE.

However, if you submitted the ALTER TABLE TO CURRENT statement on June 10, 2006 instead of June 1, 2006, the start_expression, newly resolved to CURRENT_DATE ('2006-06-10'), does not fall on a partition boundary. Therefore, all rows are scanned and the rows are repartitioned based on the new partitioning expression. The partition boundary after this statement aligns with the 10th day of the month instead of the earlier 1st day of the month.

Example 7: Table Partitioning to Record History

The following table definition is created in the year 2007 (the current year at the time). The table is partitioned to record 5 years of order history plus orders for the current year and one future year.

   CREATE TABLE Orders
     (o_orderkey INTEGER NOT NULL,
      o_custkey INTEGER,
      o_orderstatus CHAR(1) CASESPECIFIC,
      o_totalprice DECIMAL(13,2) NOT NULL,
      o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL,
      o_orderpriority CHAR(21),
      o_comment VARCHAR(79))
   PRIMARY INDEX (o_orderkey)
   PARTITION BY RANGE_N(
      o_orderdate BETWEEN DATE '2002-01-01' AND DATE '2008-12-31' 
      EACH INTERVAL '1' MONTH)
   UNIQUE INDEX (o_orderkey);

If, in 2008, you want to alter the table such that it continues to maintain 5 years of history plus the current year and one future year, you can submit the following statement in 2008:

   ALTER TABLE Orders MODIFY PRIMARY INDEX (o_orderkey)
      DROP RANGE WHERE PARTITION BETWEEN 1 AND 12
      ADD RANGE BETWEEN DATE '2009-01-01' AND DATE '2009-12-31' 
         EACH INTERVAL '1' MONTH
   WITH DELETE;

In this case, you must compute the new dates and specify them explicitly in the ADD RANGE clause. This requires manual intervention every year the statement is submitted.

Alternatively, you can define the table using CURRENT_DATE as follows. This makes it easier to alter the partitioning.

 CREATE TABLE Orders
     (o_orderkey INTEGER NOT NULL,
     o_custkey INTEGER,
     o_orderstatus CHAR(1) CASESPECIFIC,
     o_totalprice DECIMAL(13,2) NOT NULL,
     o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL,
     o_orderpriority CHAR(21),
     o_comment VARCHAR(79))
   PRIMARY INDEX (o_orderkey)
   PARTITION BY RANGE_N(o_orderdate BETWEEN
      CAST(((EXTRACT(YEAR FROM CURRENT_DATE)-5-1900)*10000+0101) AS DATE)
      AND
      CAST(((EXTRACT(YEAR FROM CURRENT_DATE)+1-1900)*10000+1231) AS DATE)
      EACH INTERVAL '1' MONTH)
   UNIQUE INDEX (o_orderkey);

You can schedule the following ALTER TABLE statement to occur yearly. This statement rolls the partition window forward by efficiently dropping and adding partitions.

ALTER TABLE Orders TO CURRENT WITH DELETE;

With the use of CURRENT_DATE, you do not need to modify the ALTER TABLE statement each time you want to repartition the data based on the new dates.

In both cases, the partitioning starts on a year boundary. In the first example, the ALTER TABLE statement does not change this, so partitioning continues to start on a year boundary. However, you can specify an ALTER TABLE statement that changes the partitioning to start on a different boundary. For example, you can roll forward to start on a particular month in a year by specifying the desired dates in the ALTER TABLE statement.

In the second example, which uses CURRENT_DATE, you can only roll forward to start on a year boundary. However, you can modify the example as follows so that partitioning can be used to roll forward to start at the beginning of a month. This case assumes that, as of the CREATE TABLE date, the Orders table will contain the last 71 months of history plus the current month and 12 months in the future (a total of 84 months).

   CREATE TABLE Orders
     (o_orderkey INTEGER NOT NULL,
      o_custkey INTEGER,
      o_orderstatus CHAR(1) CASESPECIFIC,
      o_totalprice DECIMAL(13,2) NOT NULL,
      o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL,
      o_orderpriority CHAR(21),
      o_comment VARCHAR(79))
   PRIMARY INDEX (o_orderkey)
   PARTITION BY RANGE_N(o_orderdate BETWEEN 
      CAST(((EXTRACT(YEAR FROM CURRENT_DATE)-1900)*10000 +
             EXTRACT(MONTH FROM CURRENT_DATE)*100 + 01) AS DATE) -
          INTERVAL '71' MONTH
      AND
      CAST(((EXTRACT(YEAR FROM CURRENT_DATE)+1-1900)*10000 +
             EXTRACT(MONTH FROM CURRENT_DATE)*100 + 01) AS DATE)+
          INTERVAL '13' MONTH - INTERVAL '1' DAY
      EACH INTERVAL '1' MONTH)
   UNIQUE INDEX (o_orderkey);

You can schedule the following ALTER TABLE statement to occur monthly or less frequently (but before running out of future months). This statement rolls the partition window forward by dropping and adding partitions so that the Orders table continues to contain the last 71 months of history plus the current month and 12 months in the future.

   ALTER TABLE Orders TO CURRENT WITH DELETE;

You can define the following simpler partitioning, but it might not be optimized, and the entire table might be scanned to reconcile rows when you submit an ALTER TABLE TO CURRENT statement. This case assumes that, as of the CREATE TABLE date, the Orders table will contain about 2,191 days of history plus the current day and about 365 days in the future (a total of about 7 years).

   CREATE TABLE Orders
     (o_orderkey INTEGER NOT NULL,
      o_custkey INTEGER,
      o_orderstatus CHAR(1) CASESPECIFIC,
      o_totalprice DECIMAL(13,2) NOT NULL,
      o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL,
      o_orderpriority CHAR(21),
      o_comment VARCHAR(79))
   PRIMARY INDEX (o_orderkey)
   PARTITION BY RANGE_N(o_orderdate BETWEEN
      CURRENT_DATE - INTERVAL '6' YEAR
         AND
      CURRENT_DATE + INTERVAL '1' YEAR
      EACH INTERVAL '1' MONTH)
   UNIQUE INDEX (o_orderkey);

You can schedule the following ALTER TABLE statement to occur daily or less frequently (but before running out of future days). This statement rolls the partition window forward by dropping and adding partitions only if the CURRENT_DATE is the same day of the month as the day when the last CREATE TABLE or ALTER TABLE TO CURRENT statement was submitted. Otherwise, the entire table is scanned to reconcile the rows.

   ALTER TABLE Orders TO CURRENT WITH DELETE;

This can be very inefficient if the ALTER TABLE statement is not submitted on the same day of the month as the day when the last CREATE TABLE or ALTER TABLE TO CURRENT statement was submitted. Performance degrades as the number of days between the last resolved date and the new resolved date increases due to the increasing number of rows that must be moved.

For example, if the last resolved date was January 1, 2008, and the next ALTER TABLE TO CURRENT statement is submitted on February 2, 2008, all the rows of the table will be moved to new partitions.

Example 8: Defining Ranges

The following example defines 5 ranges. The session collation is ASCII.

   RANGE_N(animal BETWEEN *, 'ape', 'bird', 'bull' AND 'cow', 
           'dog' AND *, NO RANGE, UNKNOWN)

where:

Range Includes...
1 all values less than 'ape'.
2 strings greater than or equal to 'ape' and less than 'bird'.
3 strings greater than or equal to 'bird' and less than 'bull'.
4 strings between 'bull' and 'cow'.
5 strings greater than or equal to 'dog'.

If the value of animal matches one of the defined ranges, RANGE_N returns the number of the matched range.

If the value of animal is greater than 'cow' but less than 'dog', it does not match any of the ranges, so RANGE_N returns 6 because NO RANGE is specified.

If the value of animal is NULL, RANGE_N returns 7 because UNKNOWN is specified.

Example 9: Defining Five Ranges

The following example defines 5 ranges. The session collation is ASCII.

   RANGE_N(animal BETWEEN *, 'ape', 'bird', 'bull' AND 'cow', 
           'dog' AND *, UNKNOWN)

where:

Range Includes...
1 all values less than 'ape'.
2 strings greater than or equal to 'ape' and less than 'bird'.
3 strings greater than or equal to 'bird' and less than 'bull'.
4 strings between 'bull' and 'cow'.
5 strings greater than or equal to 'dog'.

If the value of animal matches one of the defined ranges, RANGE_N returns the number of the matched range.

If the value of animal is greater than 'cow' but less than 'dog', it does not match any of the ranges, so RANGE_N returns NULL because NO RANGE is not specified.

If the value of animal is NULL, RANGE_N returns 6 because UNKNOWN is specified.

Example 10: Defining Two Ranges

In this example, the session collation is ASCII when submitting the CREATE TABLE statement, and the pad character is <space>. The example defines two ranges (numbered 1 and 2):

  • Any values greater than or equal to 'a         ' (a followed by 9 spaces) or less than 'b         ' are mapped to partition 1.
  • Any values greater than or equal to 'b         ' or less than 'c         ' are mapped to partition 2.
       CREATE SET TABLE t2
          (a VARCHAR(10) CHARACTER SET UNICODE NOT CASESPECIFIC,
           b INTEGER)
       PRIMARY INDEX (a)
       PARTITION BY RANGE_N(a BETWEEN 'a','b' AND 'c');

The following INSERT statement inserts a character string consisting of a single <tab> character between the 'b' and '1'.

   INSERT t2 ('b	1', 1);

The following INSERT statement inserts a character string consisting of a single <space> character between the 'b' and '1'.

   INSERT t2 ('b 1', 2);

The following SELECT statement shows the result of the INSERT statements. Since the <tab> character has a lower code point than the <space> character, the first string inserted maps to partition 1.

   SELECT PARTITION, a, b FROM t2 ORDER BY 1;
   *** Query completed. 2 rows found. 3 columns returned.
   *** Total elapsed time was 1 second.
   PARTITION  a          b
 -----------  ------ -----
           1  b 1        1   (string contains single <tab> character)
           2  b 1        2   (string contains single <space> character)