Column Partitioning Performance - Advanced SQL Engine - Teradata Database

Database Design

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Published
July 2021
Language
English (United States)
Last Update
2021-07-27
dita:mapPath
kko1591750222108.ditamap
dita:ditavalPath
kko1591750222108.ditaval
dita:id
B035-1094
lifecycle
previous
Product Category
Teradata Vantageā„¢
You can exploit column partitioning to improve the performance of some classes of workloads. The general performance impact of column partitioning is summarized in the following bullets.
  • You should see a significant I/O performance improvement for requests that access a variable small subset of the columns and rows of a column-partitioned table or join index. This includes accessing columns to respond to both predicates and projections.

    For example, if 20% of the data in rows is required to return the result set for a query, the I/O for a column partitioned table should be approximately 20% of the I/O for the same table without column partitioning.

    Column partitioning can further reduce I/O depending on the effectiveness of autocompression and row header compression.

    Additional I/O might be required to reassemble rows if many columns are projected or specified in query predicates.

  • You might see a negative impact on the performance of queries that access more than a small subset of the columns or rows of a column-partitioned table or join index.
  • You might see a relative increase in spool size compared to the size of a source column-partitioned table.

    When data from an autocompressed column-partitioned table is spooled, Vantage does not carry the autocompression over to the spool because the spool is row-oriented. This can lead to a large spool relative to the compressed data in the column-partitioned table.

    Because user-specified compression is carried over to the spool, applying user-specified compression to the column-partitioned table might be beneficial if spool usage in terms of space consumption and I/O operations becomes an issue.

  • You might see a reduction in CPU usage for column-partitioned tables.

    At the same time, consumption of CPU resources might increase to process autocompression, decompression, and containers.

    With a reduction in I/O and a possible increase in CPU, workloads can change from being I/O-bound to being CPU-bound, and the performance of CPU-bound workloads might not improve, and might even be worse, with column partitioning.

  • You might see a negative performance impact on INSERT operations for a column-partitioned table or join index, especially for single-row inserts, and less so for block-at-a-time bulk insert operations such as array inserts and INSERT ... SELECT operations.

    For an INSERT ... SELECT operation, the CPU cost increases as the number of column partitions increases because there is a cost to split a row into multiple column partitions.

    You must understand the potential tradeoffs when you consider the number of column partitions you create for a table. For example, workloads that contain a large number of INSERT operations can benefit from a table with fewer column partitions when it comes to CPU usage, but creating the table with columns in their own individual partitions might be more optimal for space usage and decreasing the number of I/Os, so you must determine an appropriate balance among the factors that you can finely tune.

    For example, a good candidate for column partitioning is a table where the workloads that access it are heavily query-oriented, and the benefits gained from column partitioning, even though the partitioning increases the CPU cost to load the data, a good tradeoff.

    Value compression and autocompression can have a negative impact on CPU consumption for workloads that tend to insert many rows, similar to the impact that is seen with a nonpartitioned table that has compression. Because Vantage applies autocompression to every column partition by default, this can cause a significant increase in CPU consumption compared to multivalue compression, which might only be selectively applied to columns.

    However, compression can significantly reduce space usage and decrease the I/O operations required for the INSERT operations and for subsequent requests, making the tradeoff between increased CPU consumption and decreased I/O operations a factor that must be considered.

    Be aware that FastLoad and MultiLoad are not supported for column-partitioned tables.

  • You might see a negative performance impact for UPDATE operations that select more than a small subset of rows to be updated. Because updates are done as a DELETE operation followed by an INSERT operation, Vantage needs to access all of the columns of rows selected for update.

    The cost of performing updates that access many rows when the table is column-partitioned might not be acceptable, and if it is not, you should avoid column partitioning the table. For these kinds of UPDATE operations, doing an INSERT ... SELECT request into a copy of the table might be a better alternative.

  • Take care not to over-partition tables.

    In extreme cases of over-partitioning, a column-partitioned table might be as much as 22 times larger than a table that is not column-partitioned.

    Row partitioning of a column-partitioned table might result in over-partitioning such that only a few values with the same combined partition number occur and, so only a few values are included in each of the containers, which reduces the effectiveness of row header compression.

    When increasingly more containers are required to respond to a request, each supporting fewer column partition values, the advantage of row header compression is lost. In addition, more I/O is required to access the same amount of useful data. A data block might contain a mix of eliminated and non-eliminated combined partitions for a query. But to read the non-eliminated combined partitions, the entire data block must be read and, therefore, eliminated combined partitions in the data block are also being read unnecessarily

    Over-partitioning may exist when populated combined partitions have fewer than 10 data blocks. With 10 data blocks per combined partition, 10% of the data that Vantage reads is not required by a request. As the number of data blocks decreases, in increasingly large quantity of unneeded data is read. In the worst case, even if there is column partition elimination, all the data blocks of the table must be read.

The magnitude of performance changes when accessing a column-partitioned table or join index, both positive and negative, can range over several orders of magnitude depending on the workload. You should not column-partition a table when performance is so severely compromised that you cannot offset the reduced performance with physical database design choices such as join indexes.

These and other impacts of column partitioning are described in more detail in the topics that follow.

Cases Where Positive Performance Effects Are Most Likely To Occur

The greatest improvement in the performance of querying column-partitioned tables occurs when a request specifies a highly selective predicate on a column in a single-column partition of a column-partitioned table with hundreds or thousands of columns and only a small number of them are projected by the request.

Cases Where Negative Performance Effects Are Most Likely To Occur

The greatest decrement in the performance of querying column-partitioned tables occurs when the following conditions take place.
  • Most or all of the columns in a column-partitioned table are projected by a request.
  • The request is not selective.
  • The table being queried has thousands of column partitions.
  • The retrieval performance for a column-partitioned table or join index is not good when the number of column partition contexts that are available is significantly fewer than the number of column partitions that must be accessed to respond to the query. Note that there are at least eight available column partitions contexts. Depending on your memory configuration, there may be more available contexts.

    When this happens, consider reconfiguring the table or join index to decrease the number of column partitions that need to be accessed by combining column partitions so there are fewer of them.

  • The table being queried has enough row-partitioned levels that there are very few physical rows in populated combined partitions, and the physical rows contain only one or a few column partition values.

Autocompression

When you create a column-partitioned table or join index, Vantage attempts to use one or more methods to compress the data that you insert into the physical rows of the object unless you specify the NO AUTO COMPRESS option at the time you create it or NO AUTO COMPRESS is set as the default. The process of selecting and applying appropriate compression methods to the physical containers of a column-partitioned table or join index is referred to as autocompression.

Autocompression is most effective for a column partition with a single column and COLUMN format.

Vantage only applies autocompression to column partitions with COLUMN format, and then only if it reduces the size of a container. Vantage autocompresses column partitions by default with the following requirements.
  • Minimal CPU resources are required to decompress the data for reading.
  • Vantage does not need to decompress many values to find a single value.

Vantage applies autocompression for a physical row on a per container basis. For efficiency, the system may use the autocompression method chosen for the previous container, including not using autocompression, if that is more effective. Containers in a column partition might be autocompressed in different ways. In most cases, the data type of a column is not a factor, and Vantage compresses values based only on their byte representation. As a general rule, the only difference that needs to be considered is whether the byte representation is fixed or variable length.

For some values there are no applicable compression techniques that can reduce the size of the physical row, so Vantage does not compress the values for that physical row, but otherwise the system attempts to compress physical row values using one of the autocompression methods available to it. When you retrieve rows from a column-partitioned table, Vantage automatically decompresses any compressed column partition values as is necessary.

Because the selection of autocompression methods used for a container is made by Vantage and not by users, the methods that autocompression can select from to compress the data in a column partition are not documented in the Teradata user documentation library.

Examples

Consider the following PARTITION BY clause as an example of what happens with excess partitions. Assume that o_custkey has the INTEGER data type, o_orderdate has the DATE data type, and there are 8 columns defined for the table. Vantage defines 8-byte partitioning because the maximum combined partition number before adding excess partitions to a level is 462,000 ((8+2+1)*500*84), which is greater than the maximum combined partition number for 2-byte partitioning, which is 65,335.

PARTITION BY (COLUMN,
              RANGE_N(o_custkey BETWEEN 0
                                AND     499999
                                EACH 1000),
              RANGE_N(o_orderdate BETWEEN DATE '2003-01-01'
                                  AND     DATE '2009-12-31'
                                  EACH INTERVAL '1' MONTH) )
The partitioning for this database object has the following characteristics.
  • The number of column partitions defined for level 1 is 10, including two internal use column partitions and assuming a single column per partition.

    The maximum number of column partitions is 20, meaning that 10 additional column partitions could be added.

    The maximum column partition number is 21.

  • The number of row partitions defined for level 2 is 500.

    Vantage adds any excess partitions to level 2, so level 2 has a maximum of 5,228,668,955,133,092 row partitions.

    The default for level 2 is ADD 5228668955132592.

  • The number of row partitions defined for level 3 is 84, which is the same number of partitions defined for level 3.

    Because this is not the first row partitioning level that does not specify an ADD option, the default is ADD 0.

The implication of all this is that the partitioning specified by the preceding PARTITION BY clause is equivalent to the following PARTITION BY clause.

PARTITION BY (COLUMN ADD 10,
              RANGE_N(o_custkey BETWEEN 0
                                AND     499999
                                EACH 1000)
              ADD 5228668955132592,
              RANGE_N(o_orderdate BETWEEN DATE '2003-01-01'
                                  AND     DATE '2009-12-31'
                                  EACH INTERVAL '1' MONTH)
              ADD 0 )

The adjusted maximum combined partition number is 9,223,372,036,854,774,288.

The following example is a full CREATE TABLE request that creates the column-partitioned table named t1.

CREATE TABLE t1 (
  a01 INTEGER, a02 INTEGER, a03 INTEGER, a04 INTEGER, a05 INTEGER,
  a06 INTEGER, a07 INTEGER, a08 INTEGER, a09 INTEGER, a10 INTEGER,
  a11 INTEGER, a12 INTEGER, a13 INTEGER, a14 INTEGER, a15 INTEGER,
  a16 INTEGER, a17 INTEGER, a18 INTEGER, a19 INTEGER, a20 INTEGER,
  a21 INTEGER, a22 INTEGER, a23 INTEGER, a24 INTEGER, a25 INTEGER,
  a26 INTEGER, a27 INTEGER, a28 INTEGER, a29 INTEGER, a30 INTEGER,
  a31 INTEGER, a32 INTEGER, a33 INTEGER, a34 INTEGER, a35 INTEGER,
  a36 INTEGER, a37 INTEGER, a38 INTEGER, a39 INTEGER, a40 INTEGER,
  a41 INTEGER, a42 INTEGER, a43 INTEGER, a44 INTEGER, a45 INTEGER,
  a46 INTEGER, a47 INTEGER, a48 INTEGER, a49 INTEGER, a50 INTEGER,
  a51 INTEGER, a52 INTEGER, a53 INTEGER, a54 INTEGER, a55 INTEGER,
  a56 INTEGER, a57 INTEGER, a58 INTEGER, a59 INTEGER, a60 INTEGER,
  a61 INTEGER, a62 INTEGER, a63 INTEGER, a64 INTEGER, a65 INTEGER,
  a66 INTEGER, a67 INTEGER, a68 INTEGER, a69 INTEGER, a70 INTEGER,
  a71 INTEGER, a72 INTEGER, a73 INTEGER, a74 INTEGER, a75 INTEGER,
  a76 INTEGER, a77 INTEGER, a78 INTEGER, a79 INTEGER, a80 INTEGER,
  a81 INTEGER, a82 INTEGER, a83 INTEGER, a84 INTEGER, a85 INTEGER,
  a86 INTEGER, a87 INTEGER, a88 INTEGER, a89 INTEGER, a90 INTEGER,
  a91 INTEGER, a92 INTEGER, a93 INTEGER, a94 INTEGER, a95 INTEGER,
  a96 INTEGER, a97 INTEGER)
NO PRIMARY INDEX
PARTITION BY (RANGE_N(a2 BETWEEN 1
                         AND    48
                         EACH 1,
              NO RANGE, UNKNOWN)
              ADD 10,
              COLUMN,
              RANGE_N(a3 BETWEEN 1
                         AND    50
                         EACH 10));
The partitioning for this table has the following characteristics.
  • The number of partitions defined for level 1 is 50 and, initially, the maximum number of partitions and the maximum partition number for this level is (50+10) = 60 because there is an ADD 10 clause for this level.
  • The number of partitions defined for level 2 is 99: 97 user-specified partitions plus 2 for internal use.

    Because there is a level of row partitioning without an ADD clause, the column partitioning level has a default of ADD 10, so the maximum number of partitions for this level is 109, with a maximum column partition number of 110.

  • The number of partitions defined for level 3 is 5. Because there is no ADD clause specified for this level and it is the first row partitioning level without an ADD clause, assuming a default of ADD 0, the maximum combined partition number before adding excess partitions to a level is (60*110*5), or 33,000, which is not greater than 65,535, so this partitioning consumes 2 bytes in the row header.

    The actual maximum number of partitions for this level is the largest number that would not cause the maximum combined partition number to exceed 65,535.

    This level has a default of ADD 4, and the maximum number of partitions for the level is 9.

The adjusted maximum combined partition number for the table is (60*110*9), or 59,400.

Any remaining excess partitions can be added to level 1, meaning that the maximum number of partitions for level 1 can be increased such that the maximum combined partition number does not exceed 65,535.

The ADD 10 clause for level 1 can be replaced by ADD 16.

The implication of all this is that the partitioning specified by the preceding CREATE TABLE request is equivalent to the following CREATE TABLE request.

CREATE TABLE t1 (
  a01 INTEGER, a02 INTEGER, a03 INTEGER, a04 INTEGER, a05 INTEGER,
  a06 INTEGER, a07 INTEGER, a08 INTEGER, a09 INTEGER, a10 INTEGER,
  a11 INTEGER, a12 INTEGER, a13 INTEGER, a14 INTEGER, a15 INTEGER,
  a16 INTEGER, a17 INTEGER, a18 INTEGER, a19 INTEGER, a20 INTEGER,
  a21 INTEGER, a22 INTEGER, a23 INTEGER, a24 INTEGER, a25 INTEGER,
  a26 INTEGER, a27 INTEGER, a28 INTEGER, a29 INTEGER, a30 INTEGER,
  a31 INTEGER, a32 INTEGER, a33 INTEGER, a34 INTEGER, a35 INTEGER,
  a36 INTEGER, a37 INTEGER, a38 INTEGER, a39 INTEGER, a40 INTEGER,
  a41 INTEGER, a42 INTEGER, a43 INTEGER, a44 INTEGER, a45 INTEGER,
  a46 INTEGER, a47 INTEGER, a48 INTEGER, a49 INTEGER, a50 INTEGER,
  a51 INTEGER, a52 INTEGER, a53 INTEGER, a54 INTEGER, a55 INTEGER,
  a56 INTEGER, a57 INTEGER, a58 INTEGER, a59 INTEGER, a60 INTEGER,
  a61 INTEGER, a62 INTEGER, a63 INTEGER, a64 INTEGER, a65 INTEGER,
  a66 INTEGER, a67 INTEGER, a68 INTEGER, a69 INTEGER, a70 INTEGER,
  a71 INTEGER, a72 INTEGER, a73 INTEGER, a74 INTEGER, a75 INTEGER,
  a76 INTEGER, a77 INTEGER, a78 INTEGER, a79 INTEGER, a80 INTEGER,
  a81 INTEGER, a82 INTEGER, a83 INTEGER, a84 INTEGER, a85 INTEGER,
  a86 INTEGER, a87 INTEGER, a88 INTEGER, a89 INTEGER, a90 INTEGER,
  a91 INTEGER, a92 INTEGER, a93 INTEGER, a94 INTEGER, a95 INTEGER,
  a96 INTEGER, a97 INTEGER)
NO PRIMARY INDEX
PARTITION BY (RANGE_N(a2 BETWEEN 1
                         AND    48
                         EACH 1,
              NO RANGE, UNKNOWN)
              ADD 16,
              COLUMN ADD 10,
              RANGE_N(a3 BETWEEN 1
                         AND    50
                         EACH 10)
              ADD 4);

Now the maximum combined partition number is (66*110*9), or 65,340. The maximum number of combined partitions is (66*109*9), or 64,746. The number of combined partitions is (50*99*5), or 24,750 and you can alter the table to add additional partitions to each of the partitioning levels.

Autocompression and Spools

A spool generated from a table with autocompression does not have any autocompression, though it might have inherited user-specified compression based on the rules for spool inheriting the compression characteristics of its parent. This means a very large spool relative to the size of the column partitions can be generated if those column partitions are highly compressed by the autocompression and there is little or no selectivity.

Autocompression Interactions With User-Specified Compression Methods

Vantage applies user-specified compression for columns within a multicolumn column partition value, but Vantage applies autocompression to a column partition value as a whole.

Vantage might decide not to use one or more of the user-specified compression techniques with autocompression if it determines that other autocompression techniques, including none, provide better compression for a container.

When the system constructs a container, it first applies any user-specified compression. When the container reaches a certain size limit, which is defined internally, Vantage examines the container to determine what autocompression techniques and user-specified compression can be used to reduce its size. After compressing the container, the system can append additional column partition values to it using the method of autocompression it has determined to be optimal and any user-specified techniques until a container size limit is reached, at which time Vantage constructs a new container.

As an alternative to the previous method, the system may decide that it is more efficient to reuse the compression from the previous container and apply that compression when constructing a container.

If you specify block-level compression, which is not an autocompression technique, for a column-partitioned table or join index, it is applied for data blocks independently of whether Vantage applies autocompression.

Checking the Effectiveness of Autocompression

To check the effectiveness of autocompression, use the SHOWBLOCKS command of the Ferret utility to compare the size of the data or a sample of the data stored with and without compression. You should also compare the performance with and without compression to measure the benefits of each.

SHOWBLOCKS is documented in Utilities.

Using the NO AUTO COMPRESS Option

You can override the autocompression default by specifying the NO AUTO COMPRESS option.

If you specify NO AUTO COMPRESS for a column partition, or if the column partition has ROW format, Vantage always applies any compression techniques (null compression, multivalue compression, or algorithmic compression) that you specify to every container or subrow of the column partition. For column partitions with COLUMN format, Vantage applies row header compression, which is applied to column partitions that have COLUMN format. To disable row header compression, specify ROW format when you create a column-partitioned table or join index.

There is some overhead in determining whether or not a physical row should be compressed and, if so, what compression techniques to use. If Vantage determines that there is no appropriate technique to compress the physical rows of the column partition, or if you determine that the compression techniques used do not effectively compress the column partition, you can eliminate this overhead by specifying the NO AUTO COMPRESS option for the column partition.

Anticipated Workload Characteristics for Column-Partitioned Tables and Join Indexes

The expected scenarios for column-partitioned tables and join indexes are those where the partitioned table data is loaded with an INSERT ... SELECT request with possibly some minor ongoing maintenance, and then the table is used to run analytics and data mining, at which point the table or row partitions are deleted, and the scenario then begins over again. This is referred to as an insert once scenario. Column-partitioned tables are not intended to be used for OLTP- or tactical query-type activities.

The design point for data maintenance of column-partitioned database objects is roughly 88% INSERT ... SELECT or array INSERT operations into empty table or row partitions, 2% other inserts operations, 7% update operations, and 3% delete operations.

Most requests that access a column-partitioned table or join index are expected to be selective on a variable subset of columns, or to project a variable subset of the columns, where the subset accesses fewer than 10% of the column partitions for any particular request.

The expected number of column partitions that need to be accessed for requests should preferably not exceed the number of available column partition contexts as it does in the following EXPLAIN of a SELECT request. If it does, there may be undesirable negative impact on performance in some cases.

Note that the count of 21 for the number of column partitions includes the 20 selected partitions in the SELECT statement and the delete column partition from the column-partitioned table. 21 exceeds the number of available column partition contexts, so 20 column partitions (including the delete column partition) of the column-partitioned table are merged into the first subrow column partition of the column-partitioned merge spool. The remaining column partition that needs to be accessed from the column-partitioned table is copied to the second subrow column partition in the column-partitioned merge spool (for a total of 2 subrow column partitions). This reduces the number of column partitions to be accessed at one time (which is limited by the number of available column-partition contexts). The result is then retrieved from the two subrow column partitions of the column-partitioned merge spool.

EXPLAIN SELECT a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, r,
               s, t, u
FROM /*CP*/ t1;

*** Help information returned. 13 rows.
*** Total elapsed time was 1 second.

Explanation
---------------------------------------------------------------------------
1) First, we lock PLS.t1 for read on a reserved RowHash to prevent global deadlock.
2) Next, we do an all-AMPs RETRIEVE step from 21 column partitions (20
   contexts) of PLS.t1 using covering CP merge Spool 2 (2 subrow
   partitions and Last Use) by way of an all-rows scan with no residual
   conditions into Spool 1 (all_amps), which is built locally on the
   AMPs. The size of Spool 1 is estimated with low confidence to be
   2 rows (614 bytes). The estimated time for this step is 0.03 seconds.
3) Finally, we send out an END TRANSACTION step to all AMPs involved
   in processing the request.
   -> The contents of Spool 1 are sent back to the user as the result of
   statement 1. The total estimated time is 0.03 seconds.

A column-partitioned table can be used as a sandbox table where data can be added until an appropriate indexing method is determined. Requests that access a small, but variable, subset of the columns might run more efficiently against a column-partitioned table compared to a table without column partitioning.

General Performance Guidelines for Column Partitioning

Use these guidelines as a starting point, but understand that they might not be suitable for all workloads. As you gain experience using column partitioning, you might find that alternate choices are more appropriate in some cases.

Note the following general points about optimizing the performance of column-partitioned tables and join indexes.
  • Keep in mind that column partitioning is not optimal for all query types and workloads.
  • As is always true, you should test and prove any physical database design, preferably first on a test system with a valid sample of the data and then on the production system with the full data before releasing the design into the production environment.
    This includes testing all of the following items before putting a column-partitioned table or join index into production use.
    • Queries
    • Workloads
    • Bulk data loads
    • Maintenance
    • Archive, restore, and copy performance

    Also be sure to check the space usage of the tables.

The performance guidelines for column-partitioned tables and join indexes are divided into the following subgroups.
  • General performance guidelines for column partitioning
  • Guidelines for queries, contexts, and table maintenance
  • Guidelines for partitioning column-partitioned tables and join indexes
  • Guidelines for specifying table and column attributes for column-partitioned tables
  • Guidelines for specifying compression for column-partitioned tables and join indexes
  • Guidelines on I/O operations, CPU usage, and disk space usage for column-partitioned tables and join indexes
  • Guidelines for collecting statistics on column-partitioned tables and join indexes

Guidelines for Queries, Contexts, and Table Maintenance for Column Partitioning

  • The optimal application for column-partitioned tables is large fact tables or call detail tables that are accessed frequently by analytic workloads.

    A join index might provide acceptable performance for requests that depend on primary index access, while column partitioning the base table could enable superior performance for other classes of workloads.

    Use column partitioning for tables accessed by analytic queries where the tables are refreshed using bulk loading techniques periodically with possibly some interim modifications.

    Do not use column-partitioned tables for highly volatile data, for cases in which a table is lightly populated with rows, or if the data is to be used in tactical query workloads.

  • To facilitate query performance, you should make sure that you follow at least one, if not both, of the following guidelines.
    • Ensure that the number of column partitions accessed by a request does not exceed the number of available column partition contexts.
    • Write the request in such a way that it is highly selective.

      Ideally, you should ensure that both of these guidelines are followed.

      Requests in workloads that heavily access column-partitioned tables and do not conform to this recommendation should be minimized, because their performance is likely to be degraded.

    To support acceptable performance for queries that requests that have the desired characteristics, you should employee physical database design options like secondary and join indexes when possible. Before you employee indexes, you must understand that additional maintenance costs are always incurred because Vantage must update index subtables any time the base table columns they are defined on are updated.

  • Always measure the INSERT cost for tables that are candidates for column partitioning.

    Do not use column partitioning when the increased cost of inserting data is not acceptable or is not offset by improved performance in the query workload.

  • Because you should not use column-partitioned tables for highly volatile data, apply UPDATE operations to column-partitioned tables sparingly.
  • Perform DELETE operations on a column-partitioned table either for the entire table or for entire row partitions.
  • The Optimizer uses the DBS Control field PPICacheThrP to determine the number of available file contexts that can be used at one time to access a partitioned table.

    If the number of available file contexts determined by PPICacheThrP is fewer than 8, then 8 files contexts are available if the maximum block size for the table is 128 MB or less. If the maximum block size is greater than 128 MB, the minimum number of contexts may be reduced to less than 8 but will be at least 2. If the number of file contexts specified by PPICacheThrP is more than 256, then 256 file contexts are available. (This is if the maximum block size for the table is 128K. If the maximum block size is larger, the number of available file contexts may be smaller.) For a column-partitioned database object, Vantage uses the number of file contexts as the number of available column partition contexts.

    A file context can be associated with each column partition context for some operations, but in other cases, Vantage might allocate a buffer to be associated with each column partition context.

    The ideal number of column partition contexts should be at least equal to the number of column partitions that need to be accessed by a query; otherwise, performance can degrade since not all the needed column partitions can be read at one time. Performance and memory usage can be impacted if PPICacheThrP is set too high, because too high a setting can lead to memory thrashing or even a system crash.

    At the same time, the benefits of partitioning can be lessened if PPICacheThrP is set unnecessarily low, and performance might degrade significantly. The default setting is expected to be optimal for most workloads (with the standard 80 AMP worker tasks per AMP); however, after monitoring performance and memory usage, you might need to adjust the setting to obtain the best balance.

  • You should periodically refresh or append new rows to a column-partitioned table, or to the row partitions of the column-partitioned table, using INSERT ... SELECT requests that move large quantities of data.

    Date or timestamp partitioning may help to improve column-partitioned table maintenance.

Guidelines for Partitioning Column-Partitioned Tables and Join Indexes

  • While you can define column partitioning at any level of multilevel partitioning, in most cases you should follow these guidelines when you configure the partitioning for a column-partitioned table.
    • Code the column partitioning level first and follow the column partitioning level with any row partitioning levels.
    • If you do not code the column partition at the first level, code it as the second level after DATE or TIMESTAMP row partitioning.

      Some considerations that might lead to putting the column partitioning at a lower level are the following.

    • Potential improvements for cylinder migration.
    • Block compression effectiveness.
  • If you specify row partitioning as part of a multilevel column partitioning for a table, consider specifying the ADD option for the any partitioning levels that might need to increase their number of partitions in the future.
  • Unless you have a good reason not to, you should use the defaults when you specify PARTITION BY COLUMN. Do not override the defaults without first giving the reasons for doing so serious consideration
  • For columns that are often specified in queries, but where the specific set of columns specified varies from request to request, you should create single-column partitions for the frequently specified columns.
  • Use ROW format for wide column partitions because it has less overhead than a container that holds one or a few values.

    If Vantage does not assign ROW format for a column partition, but you have determined that ROW format is more appropriate because it decreases space usage, specify ROW explicitly.

  • Use COLUMN format for narrow column partitions, especially if autocompression is effective.

    If Vantage does not assign COLUMN format for a multicolumn partition, but COLUMN is user-determined to be more appropriate (decreases space usage, etc.), specify COLUMN explicitly.

    You might need to specify COLUMN format explicitly for a multicolumn partition that contains a column with a VARCHAR, VARCHAR(n) CHARACTER SET GRAPHIC, or VARBYTE data type and defined with a large maximum value, but where values are actually very short in most cases. This is because the system-determined format might be ROW based on the large maximum length.

Guidelines for Specifying Table and Column Attributes for Column Partitioning

  • Both NoPI tables and column-partitioned tables are created with a MULTISET table type by default.

    Neither NoPI tables nor column-partitioned tables can be created as a SET table. As a result, Vantage does not perform duplicate row checks for either.

  • Use the table-level option DATABLOCKSIZE if the default data block size is appropriate for the table. Generally, a data block size of 127.5 KB or less is recommended for a column-partitioned table.
  • The settings for the table-level option FREESPACE and the DBS Control field FreeSpacePercent might require adjustment for a column-partitioned table or join index with small internal partitions, as might be the case if a table or join index is also row partitioned, particularly if you add data incrementally to the table or index.

    These options specify the amount of space on each cylinder that is to be left unused during load operations.Reserved free space allows tables to expand within their currently allocated cylinders. This can prevent or delay the need for additional cylinders to be allocated, which incurs the overhead of moving data to the new cylinders. Avoiding new cylinder allocations can improve overall system performance.

    If you do a large INSERT ... SELECT operation and internal partitions are either large or empty, little or no free space is needed. Keeping new table data physically close to existing table data and avoiding data migrations, can improve overall system performance.

  • Follow these column attribute guidelines when you create your column-partitioned tables.
    • Specify NOT NULL for columns that should not be null.

      Because nullable columns can significantly decrease the effectiveness of autocompression, you should avoid them unless you have a sound reason for specifying them.

    • Specify column-level CHECK constraints when you can.

      CHECK constraints are valid for both NoPI tables and column-partitioned tables.

    • Specify UNIQUE and PRIMARY KEY constraints when you can.

      UNIQUE and PRIMARY KEY constraints are valid for both NoPI tables and column-partitioned tables.

    • Specify foreign key constraints whenever they are applicable.

      Foreign key constraints are valid for both NoPI tables and column-partitioned tables.

  • The following features are valid for both NoPI tables and for column-partitioned tables:
    • Fallback
    • Unique secondary indexes
    • Nonunique secondary indexes
    • Join indexes
    • Reference indexes
  • You cannot specify permanent journaling for a column-partitioned table or a NoPI table.
  • The following features are not shared by nonpartitioned NoPI tables and column-partitioned tables.
    • You cannot create a column-partitioned global temporary or volatile table.

      A nonpartitioned NoPI table can be created as a global temporary or volatile table.

    • A column-partitioned table can have an identity column, while a nonpartitioned NoPI table cannot.
    • The setting of the DBS Control field PrimaryIndexDefault does not affect the default primary index specification if PARTITION BY is specified. The default behavior for creating a partitioned table without specifying a primary index or primary AMP index is always NO PRIMARY INDEX.
    • You are not required to specify NO PRIMARY INDEX when you create a column-partitioned table, but you may need to specify NO PRIMARY INDEX for a nonpartitioned NoPI table, depending on the DBS Control settings.

      The default behavior for CREATE TABLE for a column-partitioned table is NO PRIMARY INDEX if you do not specify a primary index. The setting of the DBS Control field PrimaryIndexDefault does not affect this behavior.

  • XML, BLOB, and CLOB are valid data types for both nonpartitioned NoPI tables and column-partitioned tables.
    There is a limit of 256M rows per rowkey per AMP with XML, BLOB, and CLOB data types. NoPI tables normally have only one hash value on each AMP, so the effective limit on the number of rows per AMP is approximately 256M.

    The exact number is 268,435,455 rows per rowkey per AMP.

    For column-partitioned tables, these limits are per column partition:hash bucket combination rather than PA and NoPI rows per hash value.

Guidelines for Specifying Compression for Column-Partitioned Tables and Join Indexes

  • If autocompression is effective for a column, put that column into a single-column partition even if it is not frequently accessed.
  • If autocompression on the individual columns or subsets of columns is not effective, you should group columns into a column partition when those columns are always or frequently accessed together by queries or are infrequently accessed.
  • If autocompression is not effective for a column partition, specify NO AUTO COMPRESS for the column partition (with COLUMN format) to avoid the overhead of checking for autocompression opportunities when there are none to exploit.

    Autocompression is most effective for single-column partitions with COLUMN format, less so for multicolumn partitions (especially as the number of columns increases) with COLUMN format, but not effective for column partitions with ROW format.

  • Specify multivalue compression, algorithmic compression, or both for known high-occurrence values for columns where compression can be effective. For example, if you know that the data for a column is limited to a few values, you should specify multivalue compression for those values.

    Specify an appropriate algorithmic compression for Unicode columns that contain a substantial number of compressible characters. The ASCII script (U+0000 to U+007F) of Unicode is compressible with the UTF8 algorithm.

Guidelines on Optimizing I/O Operations, CPU Usage, and Disk Space Usage for Column-Partitioned Tables and Join Indexes

  • The primary intent of column partitioning is to reduce the number of I/O operations undertaken by a query workload. The following factors can all contribute to reducing the I/O required by query workloads on column-partitioned tables.
    • Column partition elimination
    • Row partition elimination for multilevel partitioned tables and join indexes
    • Highly selective query predicates

    Other factors such as those from the following list can also play a role in reducing the number of I/O operations required to resolves a query.

    • Autocompression
    • Row header compression
    • User-specified multivalue and algorithmic compression

      Trading I/O for CPU might enhance the performance of many workloads on an I/O-bound system.

  • A secondary intent of column partitioning is to reduce the amount of disk space consumed by table and join index storage. This is particularly effective when Vantage can apply row header compression and autocompression to column-partitioned table and join index data.
  • Although column partitioning is designed to reduce the number of I/O operations required to process workloads, it is not intended to reduce the CPU usage of queries on column-partitioned tables.

    While there are cases where CPU usage decreases for queries made on a column-partitioned table, CPU usage can also increase for some functions such as INSERT operations undertaken on a column-partitioned table.

    For a CPU bound system, column partitioning might not provide any benefit, and might even degrade performance. An exception is the case where a subset of the workload that is I/O bound, even if overall the system is CPU bound, in which case column partitioning could be beneficial. Experiment with running your CPU-bound workloads against both nonpartitioned tables and column-partitioned tables to determine what the differences are.

Guidelines for Collecting Statistics on Column-Partitioned Tables and Join Indexes

  • Collect statistics regularly on the columns and indexes of a column-partitioned table just as you would for any other tables.
  • Always collect statistics on the system-derived PARTITION column.

Deleting Rows From a Column-Partitioned Table

Consider the following information before deleting data from a column-partitioned table.
  • A DELETE ALL request or an unconstrained DELETE request takes the fastpath DELETE for any table. If specified within an explicit transaction in Teradata session mode, the DELETE request must be the last statement of the last request of the transaction.

    Similarly, if the column-partitioned table is also row-partitioned, Vantage can do a fastpath DELETE. For these cases, Vantage recovers the space that had been used by the deleted rows.

A fastpath optimization is one that can be performed faster if certain conditions are met. For example, in some circumstances DELETE and INSERT operations can be performed faster if they can avoid reading the data blocks and avoid transient journaling.

  • For all other cases, a DELETE request uses a scan or an index on a column-partitioned table. In this case, the rows are marked as deleted in the delete column partition without recovering the space, but both LOB space and index space is recovered. If column partitions with ROW format do have their space deleted and if all column partitions have ROW format, the row is not marked as deleted in the delete column partition.

    Because of this, you should only delete rows in this manner for a relatively small number of rows and you should use the form of DELETE request described in the previous bullet to delete large amounts of rows.

    The space is recovered from the column-partitioned table when all the rows are deleted at the end of a transaction or when the entire row partition that contains the deleted rows is deleted at the end of a transaction.

Updating a Column-Partitioned Table

Consider the following information before updating data in a column-partitioned table.

  • An UPDATE request uses a scan, an index, or a rowID spool to access a column-partitioned table and select the qualifying rows for the update.
  • An UPDATE request is processed in the following way.
  1. Selects the rows to be updated.
  2. Transforms columns to rows.
  3. Deletes the old row without recovering the space and marks its delete bit in the delete column partition.
    Both LOB space and index space is recovered.
  4. Reinserts the updated rows, transforming them from rows to columns and appending the column values to their corresponding combined partitions.

Vantage recovers the space from the column-partitioned table when it deletes all of the rows at the end of a transaction or when it deletes the entire row partition that contains the deleted rows at the end of a transaction.

If the columns being updated are only in column partitions of a table with ROW format (and the columns being updated are not primary AMP index, primary index, or partitioning columns), the update is made in place instead of as a delete of the old row and an insert of the new row.
  • Vantage also updates the columns in the column-partitioned table that are used in a secondary or join index.

Operations and Utilities for Column-Partitioned Tables

The following tables provide a list of the operations and utilities that you might consider using with column-partitioned tables. The tables include information such as whether the utility or operation is supported for column-partitioned tables, possible substitutions for utilities and operations that Vantage does not support for column-partitioned tables, and various usage information about the operation or utility as it applies to nonpartitioned NoPI tables.

The first table lists the SQL operations you might use for column-partitioned tables.

SQL Statement Name Support for Column-Partitioned Tables Usage Notes
DELETE Supported See Deleting Rows From a Column-Partitioned Table.
INSERT Supported While INSERT operations into column-partitioned tables are supported, you should not use single-row INSERT requests to add rows to column-partitioned tables frequently because such requests can cause a large degradation in performance by needing to transform a row into a column and then appending a column partition value to each of the column partitions.
INSERT ... SELECT Supported  
MERGE Not supported The UPDATE component of a MERGE request is required to fully specify the primary index.

Vantage does not support MERGE requests for NoPI and column-partitioned tables.

Use UPDATE and INSERT requests instead.

UPDATE Supported See Updating a Column-Partitioned Table.
UPDATE (Upsert Form) Not supported The UPDATE component of an Upsert request is required to fully specify the primary index.

Vantage does not support UPSERT requests for NoPI and column-partitioned tables.

The second table lists the Teradata Tools and Utilities operations you might use for column-partitioned tables.

TTU Utility Name Support for Column-Partitioned Tables Usage Notes
CheckTable Supported The LEVEL HASH check, which is done with either an explicit LEVEL HASH command or implicitly in a LEVEL THREE command, works differently on a primary-indexed table and a NoPI or column-partitioned table.
  • For a primary-indexed table, the check regenerates the row hash for each data row based on the primary index values and then compares with the rows on disk.
  • For NoPI tables and column-partitioned tables, the check looks at the row hash value for each data row and verifies that the hash bucket that is part of the row hash correctly belongs to the AMP.
FastExport Supported You can export the data in NoPI tables and column-partitioned tables the same way you can export data rows for a primary-indexed table
FastLoad Not supported Use INSERT ... SELECT or Teradata Parallel Data Pump requests instead. These are normally used to populate a column-partitioned table.

You can also use FastLoad to load data into a staging table and then use an INSERT ... SELECT request to populate the column-partitioned table

MultiLoad Not supported Use INSERT ... SELECT and Teradata Parallel Data Pump requests instead. These are normally used to populate a column-partitioned table.
Reconfiguration Supported Reconfiguration processing for a NoPI table or column-partitioned table is similar to the Reconfiguration processing that is done for a primary-indexed table.

The main difference is that a NoPI table or column-partitioned table normally has one hash bucket per AMP, which is like a very skewed NUPI table.

Because of this, when you reconfigure a system to have more AMPs, there might be some AMPs that do not have any data for a NoPI table or column-partitioned table.

As a result, Reconfiguration can cause data skewing for both NoPI and column-partitioned tables.

Restore and Copy Supported Restore and Copy processing for a NoPI or column-partitioned table are very similar to the processing used by those utilities for a primary-indexed table.

The main difference is that a NoPI table or column-partitioned table normally has one hash bucket per AMP, which is like a very nonunique NUPI table.

Because of this, when you restore or copy the data from a NoPI table to a different configuration that has more AMPs, there might be some AMPs that do not have any data.

As a result, Restore and Copy can cause can cause data skewing for both NoPI and column-partitioned tables.

TableRebuild Supported Table Rebuild processing for a NoPI or column-partitioned table is the same as the Table Rebuild processing for a primary-indexed table.

The table must have fallback protection for Table Rebuild to be able to rebuild it. Rows in a NoPI table or column-partitioned table have a rowid, so Vantage can rebuild them the same way it rebuilds rows for a primary-indexed table.