Column Partitioning Performance - Teradata Database

Teradata Database Design

Product
Teradata Database
Release Number
15.10
Language
English (United States)
Last Update
2018-10-06
Product Category
Software

Column Partitioning Performance

You can exploit column partitioning to improve the performance of some classes of workloads. The general performance impact of column partitioning is summarized in the following bullets.

  • You should see a significant I/O performance improvement for requests that access a variable small subset of the columns and rows of a column-partitioned table or join index. This includes accessing columns to respond to both predicates and projections.
  • For example, if 20% of the data in rows is required to return the result set for a query, the I/O for a column partitioned table should be approximately 20% of the I/O for the same table without column partitioning.

    Column partitioning can further reduce I/O depending on the effectiveness of autocompression and row header compression.

    Additional I/O might be required to reassemble rows if many columns are projected or specified in query predicates.

  • You might see a negative impact on the performance of queries that access more than a small subset of the columns or rows of a column‑partitioned table or join index.
  • You might see a relative increase in spool size compared to the size of a source column‑partitioned table.
  • When data from an autocompressed column‑partitioned table is spooled, Teradata Database does not carry the autocompression over to the spool because the spool is row‑oriented. This can lead to a large spool relative to the compressed data in the column‑partitioned table.

    Because user‑specified compression is carried over to the spool, applying user‑specified compression to the column‑partitioned table might be beneficial if spool usage in terms of space consumption and I/O operations becomes an issue.

  • You might see a reduction in CPU usage for column‑partitioned tables.
  • At the same time, consumption of CPU resources might increase to process autocompression, decompression, and containers.

    With a reduction in I/O and a possible increase in CPU, workloads can change from being I/O‑bound to being CPU‑bound, and the performance of CPU‑bound workloads might not improve, and might even be worse, with column partitioning.

  • You might see a negative performance impact on INSERT operations for a column‑partitioned table or join index, especially for single‑row inserts, and less so for block‑at‑a‑time bulk insert operations such as array inserts and INSERT … SELECT operations.
  • For an INSERT … SELECT operation, the CPU cost increases as the number of column partitions increases because there is a cost to split a row into multiple column partitions.

    Teradata Database scans the source

    times, where:

     

    Equation element …

    Specifies the number of …

    number_user_spec_col_part

    user‑specified column partitions.

    number_available_col_part_contexts

    available column partition contexts.

    This might be less of a factor to consider than the increase in CPU consumption as the number of column partitions is increased.

    You must understand the potential tradeoffs when you consider the number of column partitions you create for a table. For example, workloads that contain a large number of INSERT operations can benefit from a table with fewer column partitions when it comes to CPU usage, but creating the table with columns in their own individual partitions might be more optimal for space usage and decreasing the number of I/Os, so you must determine an appropriate balance among the factors that you can finely tune.

    For example, a good candidate for column partitioning is a table where the workloads that access it are heavily query‑oriented, and the benefits gained from column partitioning, even though the partitioning increases the CPU cost to load the data, a good tradeoff.

    Value compression and autocompression can have a negative impact on CPU consumption for workloads that tend to insert many rows, similar to the impact that is seen with a nonpartitioned table that has compression. Because Teradata Database applies autocompression to every column partition by default, this can cause a significant increase in CPU consumption compared to multi‑value compression, which might only be selectively applied to columns.

    However, compression can significantly reduce space usage and decrease the I/O operations required for the INSERT operations and for subsequent requests, making the tradeoff between increased CPU consumption and decreased I/O operations a factor that must be considered.

    Be aware that FastLoad and MultiLoad are not supported for column‑partitioned tables.

  • You might see a negative performance impact for UPDATE operations that select more than a small subset of rows to be updated. Because updates are done as a DELETE operation followed by an INSERT operation, Teradata Database needs to access all of the columns of rows selected for update.
  • The cost of performing updates that access many rows when the table is column‑partitioned might not be acceptable, and if it is not, you should avoid column partitioning the table. For these kinds of UPDATE operations, doing an INSERT … SELECT request into a copy of the table might be a better alternative.

  • Take care not to over-partition tables.
  • In extreme cases of over-partitioning, a column‑partitioned table might be as much as 22 times larger than a table that is not column‑partitioned.

    Row partitioning of a column‑partitioned table might result in over-partitioning such that only a few values with the same combined partition number occur and, so only a few values are included in each of the containers, which reduces the effectiveness of row header compression.

    When increasingly more containers are required to respond to a request, each supporting fewer column partition values, the advantage of row header compression is lost. In addition, more I/O is required to access the same amount of useful data. A data block might contain a mix of eliminated and non‑eliminated combined partitions for a query. But to read the non‑eliminated combined partitions, the entire data block must be read and, therefore, eliminated combined partitions in the data block are also being read unnecessarily

    Over-partitioning may exist when populated combined partitions have fewer than 10 data blocks. With 10 data blocks per combined partition, 10% of the data that Teradata Database reads is not required by a request. As the number of data blocks decreases, in increasingly large quantity of unneeded data is read. In the worst case, even if there is column partition elimination, all the data blocks of the table must be read.

    The magnitude of performance changes when accessing a column‑partitioned table or join index, both positive and negative, can range over several orders of magnitude depending on the workload. You should not column‑partition a table when performance is so severely compromised that you cannot offset the reduced performance with physical database design choices such as join indexes.

    These and other impacts of column partitioning are described in more detail in the topics that follow.