- You see a significant I/O performance improvement for requests that access a variable small subset of the columns and rows of a column-partitioned table or join index. This includes accessing columns to respond to both predicates and projections.
For example, if 20% of the data in rows is required to return the result set for a query, the I/O for a column partitioned table is approximately 20% of the I/O for the same table without column partitioning.
Column partitioning can further reduce I/O depending on the effectiveness of autocompression and row header compression.
Additional I/O may be required to reassemble rows, if a large number of columns are projected or specified in query predicates.
- You may see a negative impact on the performance of queries that access more than a small subset of the columns or rows of a column-partitioned table or join index.
- You may see a relative increase in spool size compared to the size of a source column-partitioned table.
When data from an autocompressed column-partitioned table is spooled, Database Engine 20 does not carry the autocompression over to the spool because the spool is row-oriented. This can lead to a large spool relative to the compressed data in the column-partitioned table.
Because user-specified compression is carried over to the spool, applying user-specified compression to the column-partitioned table may be beneficial if spool usage in terms of space consumption and I/O operations becomes an issue.
- You may see a reduction in CPU usage for column-partitioned tables.
At the same time, consumption of CPU resources may increase to process autocompression, decompression, and containers.
With a reduction in I/O and a possible increase in CPU, workloads can change from being I/O-bound to being CPU-bound, and the performance of CPU-bound workloads may not improve, and may even be worse, with column partitioning.
- You may see a negative performance impact on INSERT operations for a column-partitioned table or join index, especially for single-row inserts, and less so for block-at-a-time bulk insert operations such as array inserts and INSERT ... SELECT operations.
For an INSERT ... SELECT operation, the CPU cost increases as the number of column partitions increases because there is a cost to split a row into multiple column partitions.
You must understand the potential tradeoffs when you consider the number of column partitions you create for a table. For example, workloads that contain a large number of INSERT operations can benefit from a table with fewer column partitions with respect to CPU usage, but creating the table with columns in individual partitions may be optimal for space usage and decreasing the number of I/Os, so you must determine an appropriate balance among the factors that you can finely tune.
For example, a good candidate for column partitioning is a table where the workloads that access the table are heavily query-oriented, and the benefits gained from column partitioning, even though the partitioning increases the CPU cost to load the data, a good tradeoff.
Value compression and autocompression can have a negative impact on CPU consumption for workloads that tend to insert many rows, similar to the impact that is seen with a nonpartitioned table that has compression. Because Vantage applies autocompression to every column partition by default, this can cause a significant increase in CPU consumption compared to multivalue compression, which may only be selectively applied to columns.
However, compression can significantly reduce space usage and decrease the I/O operations required for the INSERT operations and for subsequent requests, making the tradeoff between increased CPU consumption and decreased I/O operations a factor that must be considered.
FastLoad and MultiLoad are not supported for column-partitioned tables.
- You may see a negative performance impact for UPDATE operations that select more than a small subset of rows to be updated. Because updates are done as a DELETE operation followed by an INSERT operation, Vantage must access all of the columns of rows selected for update.
The cost of performing updates that access many rows when the table is column-partitioned might not be acceptable, and if it is not, avoid column partitioning the table. For these kinds of UPDATE operations, doing an INSERT ... SELECT request into a copy of the table might be a better alternative.
- Take care not to over-partition tables.
In extreme cases of over-partitioning, a column-partitioned table may be as much as 22 times larger than a table that is not column-partitioned.
Row partitioning of a column-partitioned table may cause over-partitioning. That is, the number of values with the same combined partition number is small, so a small number of values are included in each container, reducing the effectiveness of row header compression.
When increasingly more containers are required to respond to a request, each supporting fewer column partition values, the advantage of row header compression is lost. Also, more I/O is required to access the same amount of useful data. A data block may contain a mix of eliminated and non-eliminated combined partitions for a query. But to read the non-eliminated combined partitions, the entire data block must be read and, therefore, eliminated combined partitions in the data block are also being read unnecessarily.
Over-partitioning may exist when populated combined partitions have fewer than 10 data blocks. With 10 data blocks per combined partition, 10% of the data that Database Engine 20 reads is optional by a request. As the number of data blocks decreases, in increasingly large quantity of unneeded data is read. In the worst case, even if there is column partition elimination, all the data blocks of the table must be read.
The magnitude of performance changes when accessing a column-partitioned table or join index, both positive and negative, can range over orders of magnitude, depending on the workload. Do not column-partition a table when performance is so severely compromised that you cannot offset the reduced performance with physical database design choices such as join indexes.
These and other impacts of column partitioning are described in more detail in the topics that follow.
- Cases Where Positive Performance Effects Are Most Likely to Occur
- Cases Where Negative Performance Effects Are Most Likely to Occur
- Autocompression
- Examples
- Autocompression and Spools
- Autocompression Interactions with User-Specified Compression Methods
- Checking the Effectiveness of Autocompression
- Using the NO AUTO COMPRESS Option
- Anticipated Workload Characteristics for Column-Partitioned Tables and Join Indexes
- General Performance Guidelines for Column Partitioning
- Guidelines for Queries, Contexts, and Table Maintenance for Column Partitioning
- Guidelines for Partitioning Column-Partitioned Tables and Join Indexes
- Guidelines for Specifying Table and Column Attributes for Column Partitioning
- Guidelines for Specifying Compression for Column-Partitioned Tables and Join Indexes
- Guidelines on Optimizing I/O Operations, CPU Usage, and Disk Space Usage for Column-Partitioned Tables and Join Indexes
- Guidelines for Collecting Statistics on Column-Partitioned Tables and Join Indexes
- Deleting Rows from a Column-Partitioned Table
- Updating a Column-Partitioned Table
- Operations and Utilities for Column-Partitioned Tables