- The number of queries in workloads that access the table has a range constraint, particularly a date constraint on a column of the table.
- Queries have an equality constraint on a column of the table, and that column is not the only primary index column or is not a primary index column.
- If there is a primary index that is used only, or principally, to achieve an even distribution of rows, rather than for accessing or joining rows, and access is frequently made on a column that is suitable for partitioning.
- If there is a primary index that is used to achieve an even distribution of rows and for accessing or joining rows, and columns suitable for partitioning are included in the primary index definition.
- If there is a primary index that is used to achieve an even distribution of rows and for accessing or joining rows, but columns suitable for partitioning are not included in the primary index definition.
This may be a good candidate for partitioning, but you need to pay attention to weighing the performance tradeoffs that often cause this situation.
- Use the RANGE_N function for a partitioning expression, preferably on a column with a DATE or TIMESTAMP data type, because doing so typically provides more opportunities for row partition elimination, and the Optimizer knows the exact number of defined row partitions.
For example, instead of the following row partitioning expression.
PARTITION BY column
use
PARTITION BY RANGE_N(column BETWEEN m AND n EACH s)
Dates and timestamps are often used in query conditions and therefore makes good candidates for a partitioning expression.
When partitioning on a DATE column, use RANGE_N with a single overall range divided into ranges of equal size as follows.
PARTITION BY RANGE_N(date_column BETWEEN DATE '...' AND DATE '...' EACH INTERVAL 's' t)
- Use DATE constants or TIMESTAMP constants such as DATE '2011-08-06' or TIMESTAMP '2011-08-25 10:14:59' to specify the ranges in a partitioning expression. This is easier to read, makes clear that the expression is defined on a date, and removes the dependence on the FORMAT used in the implicit conversion to a date. You can also specify TIMESTAMP(n) WITH TIME ZONE constants for RANGE_N partitioning.
For example, you can specify a RANGE_N-based partitioning expression like the following.
RANGE_N(t BETWEEN TIMESTAMP '2003-01-01 00:00:00+13:00' AND TIMESTAMP '2009-12-31 23:59:59-12:59' EACH INTERVAL '1' MONTH)
Use an INTERVAL constant in the EACH clause where the variable t is DAY, MONTH, YEAR, or YEAR TO MONTH.
Do not use INTEGER values or CHARACTER constants for dates in your partitioning expressions, because their values can easily be incorrectly specified and may not convert to the date you expected.
For example, partitioning by the name of a DATE column as follows may seem intuitive:
PARTITION BY sales_date
This form does not produce a syntax error, works correctly for dates in the early 1900s, and follows the rule of implicit conversion to get an INTEGER partition number, but such a table is rarely useful.
For this case, instead use RANGE_N with a granularity of EACH INTERVAL '1' DAY.
Specifying the following to indicate that a date column in the primary index is to be partitioned by week may also seem intuitive:
PARTITION BY 7
However, this form does produce a syntax error.
For this case, use RANGE_N with a granularity of EACH INTERVAL '7' DAY.
- Specify only the date ranges you need and a small number of ranges for the future.
By limiting ranges to those that are needed, you help the Optimizer to better cost plans and also allow for more efficient primary index access, joins, and aggregations when the partitioning column is not included in the primary index.
This is not as important if you collect current PARTITION statistics.
A good guideline is to define 10% or fewer of the partitions to be empty to be able to handle future dates. Define enough future ranges to minimize the frequency of ALTER TABLE statements needed to drop and add ranges. However, if you make changes too infrequently, you may forget to alter the table entirely.
Altering the table only once a year is not recommended for the following reasons:- You must create a large number of empty partitions.
- You may forget to alter the partitioning ranges.
- You fail to follow the procedure often enough to prevent problems from occurring when you finally get around to following it.
You must balance these concerns of having enough future partitions, but not too many.
- RANGE_N permits a faster partitioning change using ALTER TABLE … DROP RANGES or ALTER TABLE ... ADD RANGES.
- Reference a single INTEGER or DATE column in the RANGE_N function.
Do not use an expression such as x/10 in place of a simple column reference in a partitioning expression constructed from a RANGE_N function even if the expression only references a single column.
- Define ranges of equal size using EACH to specify the granularity of the partition.
Multiple ranges, with or without specifying an EACH granularity, require more CPU time to run and different sized ranges can prevent fast partitioning changes.
- Consider not specifying the NO RANGE, NO RANGE OR UNKNOWN, or UNKNOWN partitions in the RANGE_N function.
Using these partitions can degrade query performance because queries can be forced to scan data in these partitions unnecessarily. If you specify the NO RANGE, NO RANGE OR UNKNOWN, or UNKNOWN partitions, you can also affect the performance of ALTER TABLE partitioning change negatively because data may need to be moved among the partitions. An UNKNOWN partition is not needed if the partitioning cannot produce rows with unknown partition values (this is often the case when a partitioning column is specified to be NOT NULL).
- Deciding not to use a RANGE_N function for a partitioning expression can be a good choice in the following instances:
- You do not partition the table or join index on a DATE column.
- You use constant or USING variable equality conditions on the partitioning columns in a majority of the queries in your workloads to specify a single partition.
If the assumption made by the Optimizer that the table (not based on a CASE_N partitioning expression) has 65,535 partitions provides good plans in these cases, you need not alter partitioning, and the system produces the plans and data maintenance performance your site requires.
Consider using row partitioning when your workloads have any of the following characteristics: