15.00 - Basing the Partitioning Expression on a CASE_N Function - Teradata Database

Teradata Database Design

prodname
Teradata Database
vrm_release
15.00
category
User Guide
featnum
B035-1094-015K

Basing the Partitioning Expression on a CASE_N Function

You can use the CASE_N function o concisely define a partitioning expression for which each partition contains data based on an associated condition. When you specify CASE_N for single‑level partitioning expression, two partition numbers, NO CASE and UNKNOWN, are automatically reserved for specific uses.

The CASE_N function is patterned after the SQL CASE expression (see SQL Functions, Operators, Expressions, and Predicates). The function returns an INTEGER value numbered from 1, indicating which CASE_N condition first evaluated to TRUE for the particular value. The returned value can map directly to a partition number or be further modified to calculate the partition number.

Assume a table has a total_revenue column, defined as DECIMAL. The table can be partitioned on that column, so that low revenue products are separated from high revenue products. The partitioning expression can be written as follows:

     CREATE TABLE …
     PRIMARY INDEX (store_id, product_id, sales_date)
     PARTITION BY CASE_N (total_revenue <   10000,
                          total_revenue <  100000,
                          total_revenue < 1000000,
                  NO CASE, UNKNOWN);

This request defines 5 partitions, conceptually numbered from 1 to 5 in the order they are specified in the partitioning expression.

 

This partition number …

Represents …

            1

products with total_revenue less than 10,000.

            2

products with total_revenue of at least 10,000, but less than 100,000.

            3

products with total_revenue of at least 100,000 but less than 1,000,000.

            4

   (NO CASE)

any value that does not evaluate to TRUE or UNKNOWN for any previous CASE_N condition, which in this case is total_revenue equal to or greater than 1,000,000.

            5

(UNKNOWN)

values for which it is not possible to determine the truth value of a previous CASE_N expression.

For this partitioning condition, a row with a null for total_revenue is assigned to the UNKNOWN partition, because by definition, it is not possible to evaluate whether a null is less than 10,000.

The rows in the NO CASE and UNKNOWN partitions are valid, and the system accesses those partitions to process queries unless the query conditions exclude them. By defining NO CASE and UNKNOWN partitions, you can ensure that any possible value maps to a partition. In the absence of those partitions, some values result in errors and so are not be inserted into the table. In practice, it is probably better to have more than three revenue ranges unless the queries against this table rarely specify narrower revenue ranges.

This example demonstrates that CASE_N can be used to define complicated partitioning expressions tailored to a specific table and specific query workloads:

     CREATE TABLE 
     …
     PRIMARY INDEX (col1, col2)
     PARTITION BY CASE_N (col3 < 6,
                          col3 >= 8 
                  AND     col3 < 10 
                  AND     col4 <> 12,
                          col5 <> 10 
                  OR      col3 = 20,
                  NO CASE OR UNKNOWN);

Without knowing the meaning and data demographics of the columns, there is no way of knowing whether this partitioning expression is useful.

Note that the NO CASE and UNKNOWN partitions are combined into a single partition in this example.

Unlike the case for the previous partitioning expression examples, the Optimizer knows how many partitions are defined when CASE_N is used as the partitioning expression and does not have to assume a default number of 65,535 partitions.

There are several disadvantages to this form:

  • The partitioning of the table cannot be altered unless it is empty.
  • Row partition elimination for queries is often limited to constant or USING value equality conditions on the partitioning columns.
  • The Optimizer might not eliminate some row partitions that it possibly could if the partitioning were better conceived.
  • As the number of conditions increases, evaluating a CASE_N function can be costly in terms of CPU cycles, and a CASE_N might also contribute to causing the table header to exceed its maximum size limit. Therefore, you may need to limit the number of conditions you define in the CASE_N function to a relatively small number.
  • See “Considerations for Basing a Partitioning Expression on a CASE_N Function” on page 349 for additional information on this topic.