A partitioned primary index is single-level or multiple-level, depending on the number of partitioning expressions defined for the table.
You can define partitioning for global temporary and volatile tables, for base tables (but not queue tables), and for noncompressed join indexes.
Optimal partitioning expressions are typically coded using CASE_N or RANGE_N expressions based on exact numeric, character or DateTime columns. DateTime expressions can include the BEGIN and END bound functions and the DATE, CURRENT_DATE, and CURRENT_TIMESTAMP functions.
The test value you specify with a RANGE_N function must have a BYTEINT, SMALLINT, INTEGER, BIGINT, DATE, TIMESTAMP(n), TIMESTAMP(n) WITH TIME ZONE, CHARACTER, VARCHAR, or GRAPHIC CHARACTER SET GRAPHIC data type.
The test value you specify with a RANGE_N function must have a BYTEINT, SMALLINT, INTEGER, BIGINT, DATE, TIMESTAMP(n), TIMESTAMP(n) WITH TIME ZONE, CHARACTER, VARCHAR, GRAPHIC, or VARCHAR(n) CHARACTER SET GRAPHIC data type.
Support for the BEGIN and END bound functions also includes support for the IS [NOT] UNTIL_CHANGED and IS [NOT] UNTIL_CLOSED functions.
You can specify any valid Date or Time element for the IS [NOT] UNTIL_CHANGED and IS [NOT] UNTIL_CLOSED functions in partitioning expressions based on the CASE_N function.
You cannot specify multivalue compression for the partitioning columns of a partitioning expression.
Row-partitioning optimizes range queries while also providing efficient primary index join strategies. Analyze your range query optimization needs carefully because there are performance tradeoffs between specific range access improvements and possible decrements for primary index accesses and joins and aggregations on the primary index that occur as a function of the number of populated row partitions.
The partitioning maxima using different types of partitioning expressions are listed in the following table. These maxima apply to both row-partitioned tables and join indexes and to column-partitioned tables and join indexes (which may also have row partitioning).
| Partitioning Parameter | Maximum Value |
|---|---|
| Combined partitions for a table with 2-byte partitioning. | 65,535 |
| Combined partitions for a table with 8-byte partitioning. | 9,223,372,036,854,775,807 |
| Valid range for the number of partitioning levels for a table or join index with 2-byte partitioning. | 1 - 15 |
| Valid range for the number of partitioning levels for a table or join index with 8-byte partitioning. | 1 - 62 |
| RANGE_N Number of ranges plus NO RANGE [OR UNKNOWN] and UNKNOWN partitions, with 2-byte partitioning. |
65,535 |
| RANGE_N Number of ranges plus NO RANGE [OR UNKNOWN] and UNKNOWN partitions (whether or not specified), with 8-byte partitioning. |
9,223,372,036,854,775,807 |
| CASE_N Number of conditions. |
Bound by the request text size and other limits. |
When you have a partitioned table, the bits stored in the internal partition number of the rowID represent the combined partition number of the row, which is determined by combining the partitioning expressions of a PARTITION BY clause into a single expression. The expression that combines the partitioning expressions of multilevel partitioning into a single combined partition number is called a combined partitioning expression.
The terms internal partition number and combined partition number are defined as follows.
| Term | Definition |
|---|---|
| Internal partition number | A value Vantage calculates from the combined partition
number. The internal partition number is used to number partitions internally and is stored in the rowID. The internal and combined partition numbers can be identical if no modification is needed. For single-level partitioning only, partitions for the NO RANGE [OR UNKNOWN], NO CASE [OR UNKNOWN], and UNKNOWN options are placed at fixed internal partitions, followed by partitions for range and conditions following. Modification is required to retain existing internal partition numbers after an ALTER TABLE request that drops or adds ranges or partitions. For a table without partitioning, the internal partition number is zero. |
| Combined partition number | The value computed for the
partitioning expressions for a row in a partitioned table. For a table without partitioning, the combined partition number is 0. Combining the results for a table with single-level partitioning, the combined partition number is the same as the value of the single partitioning expression. When you specify the system-determined columns PARTITION or PARTITION#Ln in a request, Vantage converts the internal partition number stored in the rowID into the appropriate combined partition number or the partition number of the specified partitioning level and returns that value to the requestor. |
Vantage groups partitioned rows into partition groups on an AMP first by their internal partition number, then by row hash value within each internal partition (if there is a primary index, otherwise by the assigned hash bucket), and then by uniqueness value.
If a SELECT query specifies values for all the primary index columns, the AMP that contains those rows can be determined, and only one AMP must be accessed. If the query conditions are not specified on the partitioning columns, then each internal partition can be probed to find the rows based on the hash value, assuming there is no usable alternative index. If conditions are also specified on the partitioning columns, then row partition elimination may further reduce the number of partitions to be probed on that AMP. See Row Partition Elimination .
If a SELECT query does not specify the values for all primary index columns, an all-AMP, full-table scan is required for a table with no partitioning when there is no usable alternative index. With partitioning, if conditions are specified on the partitioning columns, row partition elimination may reduce an all-AMP, full-table scan to an all-AMP scan of only the partitions that are not eliminated. The extent of row partition elimination depends on the partitioning expressions, the conditions specified in the query, and the ability of the Optimizer to recognize such opportunities.
Suppose a query requests only orders dated August 2010. For a nonpartitioned table, the whole table is read to determine which rows are from August, because August rows are scattered throughout the table. For a table partitioned by month, row partition elimination can be used to exclude partitions that do not contain activity for August.
Because all the August rows are grouped together in the partitioned table, the first row for August can be read first, and then then the rows can be read sequentially until a September row is found and the system stops reading rows. This query requested about 1/12 of the rows, so only about 1/12 of the partitioned table must be read. If you partition by day and change the query to make the date August 2nd, an even smaller subset of the table is all that must be read.
Rows for current day activity that are inserted later tonight into the nonpartitioned table are scattered throughout the entire table, so the average hits per block value is low. For the partitioned table, the inserts are clustered in a smaller area, providing a better locality of reference. If the partitioning is by week or by day, the inserts are clustered even more tightly, and the hits per block value is higher.
Deleting the oldest month of data is a full-table scan for the nonpartitioned table, but for the partitioned table the rows cluster together, and can all be deleted with a high number of average hits per block. Transient journaling is not done for each of the rows, further enhancing performance when all the rows of a partition are deleted as the last action of a transaction.
If the table is partitioned on order date, you do not need a NUSI on order date. You can drop any NUSI for the nonpartitioned table when you convert the primary index for the table to a partitioned primary index.
- A majority of the requests in the workload specify range predicates on a column, especially a date column, of the candidate partitioned table.
- A majority of the queries in the workload specify an
equality predicate on a column of the candidate partitioned table, and that
column is either of the following:
- Not the only column in the primary index column set
- Not a primary index column.
- The primary index is used for the following
applications.
- Primarily or exclusively to distribute rows evenly across the AMPs.
- Rarely, if ever, to access rows or to join tables.
- To access rows using a condition specified on a column that is suitable for partitioning the table.
- The primary index, defined with the entire set of
partitioning columns, if any, is used for the following applications.
- To distribute rows evenly across the AMPs.
- To access rows directly or as a table join condition.
- The primary index, defined without the entire set of
partitioning columns, is used for the following applications.
- To distribute rows evenly across the AMPs.
- To access rows directly or as a table join
condition.
You must pay attention to weighing and optimizing the performance tradeoffs the situation makes possible.
- Partitioned access by means of an equality constraint on all primary index and partitioning columns is as efficient as the same access made by means of a nonpartitioned primary index.
- Partitioned access on an equality constraint on the primary index columns and an equality or other constraint on the partitioning columns that limits access to a single partition, produces performance as efficient as that made with a nonpartitioned primary index.
- Partitioned access with a constraint on the partitioning columns can approach that of nonpartitioned primary index access when all the primary index columns are specified with equality constraints on an expression that does not reference any columns, depending on the extent of row partition elimination (an indirect measure of the number of internal partitions that must be probed before the desired row is found) required.
- Partitioned access using an equality condition on the primary index that neither includes all of the partitioning columns nor makes a constraint on the partitioning columns may not be as efficient as access by means of a nonpartitioned primary index depending on the number of internal partitions that contain data, but is typically more efficient than a full-table scan.
- Partitioned access when not all of the primary index columns are specified with equality constraints, but there are conditions specified on the partitioning columns that limit the number of internal partitions that must be scanned is more efficient than access by a nonpartitioned primary index, which requires a full-table scan.
- Inserting a large number of rows can be much faster than
the same insert operation into an identical table defined without
partitioning.
To optimize load performance, the first partitioning expression of the table must match the row arrival pattern of data, which is most often based on transaction date. For example, base the first partitioning expression on a DATE or TIMESTAMP column.
- Accessing rows using conditions on partitioning columns can
be faster than the same access to the same rows when the table is defined
with a nonpartitioned primary index.
The performance of SELECT, UPDATE, and DELETE operations is improved when queries specify conditions on the partitioning columns of the table, because such predicates cause a higher average number of hits per data block.
The following design considerations are also important for this partitioned performance characteristic:- The degree of improvement in the performance of select, update, and delete operations is proportional to the extent of row partition elimination that can be realized. See Row Partition Elimination .
That is, the more internal partitions that can be excluded by conditions specified on the partitioning columns for a partitioned table, the better.
- Performance improvements are greater as a function of how finely the granularity is defined for the internal partitions.
- The degree of improvement in the performance of select, update, and delete operations is proportional to the extent of row partition elimination that can be realized. See Row Partition Elimination .
- Depending on the number of internal partitions that contain
data, accessing rows by means of an equality constraint on their primary
index can be slower for a partitioned table than for the equivalent
nonpartitioned table when you neither include all of the partitioning
columns in the primary index definition nor specify a constraint on those
columns in the query.The following design considerations are important for this partitioned performance characteristic:
- The more coarse the granularity of the partitioning you define, or the more internal partitions eliminated, the less performance degradation you are likely to experience.
- Defining an appropriate secondary index on the partitioned table may minimize the performance degradation of primary index access.
- Joins can be different for partitioned and nonpartitioned
tables that are otherwise equivalent, but the effect of the different join
strategies may not be easy to predict.
The join plan the Optimizer pursues depends on collected statistics, dynamic-AMP samples, and derived statistics. The usual recommendation applies here: check EXPLAIN reports to determine the best way to design your indexes to achieve the optimal join geography.
The following design considerations are important for this partitioned performance characteristic:- Primary index-to-primary index joins are more
likely to generate different join plans than other
partitioned-nonpartitioned join comparisons.
To minimize the potential for performance issues in making primary index-to-primary index joins, consider the following guidelines:
Partition the two tables identically, if possible.
A coarser granularity of partitions is likely to be superior to a finer partition granularity.
- Examine your EXPLAIN reports to determine which
join methods the Optimizer is selecting to join the tables.
Rowkey-based merge joins are typically better than joins based on another family of join methods.
- Efficient row partition elimination can often convert joins that perform poorly into good performers.
- Poor join performance is typical when joining a partitioned table with a nonpartitioned table, the partitioned table partitioning column set is not defined in the nonpartitioned table, there are no predicates on the partitioned table partitioning column set, and there are many internal partitions defined for the partitioned table.
- Primary index-to-primary index joins are more
likely to generate different join plans than other
partitioned-nonpartitioned join comparisons.
- The need for secondary indexes is often different for
partitioned and nonpartitioned tables that are otherwise equivalent.
Opposing scenarios present themselves for resolving the issues that may arise from the existence or absence of secondary indexes:
| Recommendation | Situation |
|---|---|
| Drop an existing secondary index on the partitioning column set of a partitioning expression. | Enforcing the uniqueness of the
partitioning column set of a partitioning expression is
optional.Typically, uniqueness is not required. This has the following benefits:
|
| Add a new secondary index on the primary index column set. | One or more columns of the
partitioning column set is not also a member of the primary index
column set and the primary index values are unique. In this situation, you must define a USI on the primary index column set to enforce its uniqueness because you cannot define a partitioned primary index to be unique unless its definition contains all of the partitioning columns. The addition of the USI may cause worse performance. |
- Does a partitioning expression based on a CASE_N or RANGE_N expression better support workloads against the table?
- Does the partitioning expression specify a NO CASE, NO CASE
OR UNKNOWN, NO RANGE, NO RANGE OR UNKNOWN, or UNKNOWN option?
If the test value in a RANGE_N expression can never be null, there is no need for an UNKNOWN or NO RANGE OR UNKNOWN partition.
If a condition in a CASE_N expression can never be unknown, there is no need for an UNKNOWN or NO CASE OR UNKNOWN partition.
- Should the table be partitioned on only one level or on multiple levels?
- If the partitioning expression specifies CURRENT_DATE functions, CURRENT_TIMESTAMP functions, or both, how can you configure the expression to minimize problems deriving from reconciling to a new current date or current timestamp value?
- The query workloads that access the partitioned table.
This factor must be examined at both the specific level and at the general level.
- Performance
- Does a nonpartitioned table perform better than a partitioned table for the given workload and for critical queries?
- Is one partitioning strategy more high-performing than others?
- Do other indexes such as USIs, NUSIs, or join indexes improve performance?
- Does a partitioning expression cause significant row partition elimination for queries to occur or not?
- Access methods and predicate conditions
- Is access to the table typically made by primary index, secondary index, or another access method?
- Do typical queries specify an equality condition on the primary index and include the complete partitioning column set?
- Do typical queries specify a non-equality condition on the primary index or the partitioning columns?
- Join strategies
- Do typical queries specify an equality condition on the primary index column set (and, if different, the partitioning column set)?
- Do typical queries specify an equality condition on the primary index column set but not on the partitioning column set?
- Do typical query conditions support row partition elimination?
- Data maintenance
- What are the relative effects of a partitioned table and a nonpartitioned table with respect to the maintenance workload for the table?
- If you must define a USI on the primary index to make the index unique, how much additional maintenance is required to update the USI subtable?
- Frequency and ease of altering partitions
- Is a process in place to make sure that ranges are added and dropped as necessary?
- Does the partitioning expression permit you to add and drop ranges?
- If the number of rows moved by dropping and adding ranges causes large numbers of rows to be moved, do you have a process in place to instead create a new table with the desired partitioning in place, then INSERT ... SELECT or MERGE the source table rows into the newly created target table with error logging?
- If you want to delete rows when their partition is dropped, have you specified NO RANGE, NO RANGE OR UNKNOWN, or UNKNOWN partitions?
- Backup and restore strategies.
See Single-Level Partitioning Case Studies for a set of design scenarios that evaluate the performance effects of different partitioned primary indexes.
See Single-AMP Queries and Partitioned Tables and All-AMP Tactical Queries and Partitioned Tables for specific design issues related to partitioned table support for tactical queries.