You can use the CASE_N function to concisely define a partitioning expression for which each partition contains data based on an associated condition. When you specify CASE_N for single-level partitioning expression, two partition numbers, NO CASE and UNKNOWN, are automatically reserved for specific uses.
The CASE_N function is patterned after the SQL CASE expression. The function returns an INTEGER value numbered from 1, indicating which CASE_N condition first evaluated to TRUE for the value. The returned value can map directly to a partition number or be further modified to calculate the partition number.
Assume a table has a total_revenue column, defined as DECIMAL. The table can be partitioned on that column, so that low revenue products are separated from high revenue products. The partitioning expression can be written as follows:
CREATE TABLE … PRIMARY INDEX (store_id, product_id, sales_date) PARTITION BY CASE_N (total_revenue < 10000, total_revenue < 100000, total_revenue < 1000000, NO CASE, UNKNOWN);
This request defines 5 partitions, numbered from 1 to 5 in the order specified in the partitioning expression.
Partition Number | Representation |
---|---|
1 | Products with total_revenue less than 10,000. |
2 | Products with total_revenue of at least 10,000, but less than 100,000. |
3 | Products with total_revenue of at least 100,000 but less than 1,000,000. |
4 (NO CASE) | Any value that does not evaluate to TRUE or UNKNOWN for any previous CASE_N condition, which is total_revenue equal to or greater than 1,000,000. |
5 (UNKNOWN) | Values for which determining the truth value of a previous CASE_N expression is impossible. For this partitioning condition, a row with a null for total_revenue is assigned to the UNKNOWN partition, because evaluating whether a null is less than 10,000 is impossible. |
The rows in the NO CASE and UNKNOWN partitions are valid, and the system accesses those partitions to process queries unless the query conditions exclude them. By defining NO CASE and UNKNOWN partitions, you can make sure that any possible value maps to a partition. In the absence of those partitions, values that cause errors are not inserted into the table. Typically, having more than three revenue ranges is better unless the queries against this table rarely specify narrower revenue ranges.
This example demonstrates that CASE_N can be used to define complicated partitioning expressions tailored to a specific table and specific query workloads:
CREATE TABLE … PRIMARY INDEX (col1, col2) PARTITION BY CASE_N (col3 < 6, col3 >= 8 AND col3 < 10 AND col4 <> 12, col5 <> 10 OR col3 = 20, NO CASE OR UNKNOWN);
Without knowing the meaning and data demographics of the columns, there is no way of knowing whether this partitioning expression is useful.
The NO CASE and UNKNOWN partitions are combined into a single partition in this example.
The Optimizer knows the number of partitions that are defined when CASE_N is used as the partitioning expression and does not have to assume a default number of 65,535 partitions.
- The partitioning of the table cannot be altered unless empty.
- Row partition elimination for queries is often limited to constant or USING value equality conditions on the partitioning columns.
- If partitioning is better conceived, the Optimizer may eliminate row partitions.
- As the number of conditions increases, evaluating a CASE_N function can be costly in terms of CPU cycles, and a CASE_N may also contribute to causing the table header to exceed its maximum size limit. Therefore, you may need to limit the number of conditions you define in the CASE_N function to a relatively small number.