16.20 - Performance Issues for Column-Partitioned Tables - Teradata Database - Teradata Vantage NewSQL Engine

Teradata Vantage™ SQL Data Definition Language Detailed Topics

Product
Teradata Database
Teradata Vantage NewSQL Engine
Release Number
16.20
Release Date
March 2019
Content Type
Programming Reference
Publication ID
B035-1184-162K
Language
English (United States)

Column partitioning is a physical database design choice that can improve the performance of some classes of workloads. Columnar partitioning is expected to have the following performance impacts.

  • You should expect a significant I/O performance improvement for queries that access a variable small subset of the columns either in predicates or as projections, and rows of a column-partitioned table.

    For example, if 20% of the data in rows is needed to satisfy a query, the I/O for a column-partitioned table should be approximately 20% of the I/O for a table without column partitioning.

    I/O might be further reduced depending on the effectiveness of autocompression. Additional I/O might be needed to reconstruct rows if many columns are projected or used in predicates.

  • A potentially significant negative performance impact can occur for requests that access more than a small subset of the columns, rows, or both of a column-partitioned table.
  • CPU usage might increase for handling autocompression, decompression, and containers. With a reduction in I/O and a possible increase in CPU use, workloads can change from being I/O-bound to being CPU-bound.

    The performance of CPU-bound workloads might not improve with column partitioning.

  • A potentially significant negative performance impact on INSERT operations for a column-partitioned table or join index, especially for single-row INSERT operations, but less so for bulk insert operations such as array INSERT and INSERT … SELECT statements.
  • A potentially significant negative performance impact for UPDATE operations that select more than a small subset of rows to be updated, as described by the first 2 bullets. Teradata Database does update operations as a DELETE operation followed by an INSERT operation; therefore, all the columns of rows selected for UPDATE operations must be accessed.

There is a range of from 1 or more orders of magnitude of improved performance to 1 or more orders of magnitude of degraded performance for accessing and updating column-partitioned tables.

The greatest improvement occurs when there is a highly selective predicate on a column in a single-column partition of a table with hundreds or thousands of columns and only a few columns are projected.

The worst case is when a query is not very selective, most or all of the columns in the table are projected, there are thousands of column partitions, only 8 available column partition contexts, and the table is row-partitioned in a way that there are very few rows in a populated partition.