- Queries that access a small, but variable, subset of columns run more efficiently against a column-partitioned table when compared with a nonpartitioned table. The majority of queries against a column-partitioned table should be selective on a variable subset of columns, and project a variable subset of the columns, where the subset accessed is less than 10% of the partitions for any particular query. The number of partitions that need to be accessed for queries should not exceed the number of available column partition contexts.
- Use column partitioning for tables that have the following properties.
- They are accessed using analytic queries.
- They are not frequently refreshed using bulk data loading methods.
- The table partitions are seldom modified in the interim.
- Do not use column-partitioned tables for highly volatile data.
- Place the COLUMN partitioning level at the lowest partitioning level possible, ideally at a lower level than any ROW partitioning levels in the partitioning expression. The COLUMN partitioning level should be the first partitioning level you specify.
The following are some of the considerations that might lead to putting column partitioning at a lower level.
- Potential improvements for cylinder migration.
- Temperature-based block compression effectiveness for both hot and cold data.
- For queries that frequently access columns, but where the specific set of columns accessed varies from request to request, you should place the frequently accessed columns into single-column partitions.
This action enables the Optimizer to use column partition elimination to optimize queries against the table.
- For queries that frequently access the same set of columns across the requests in a workload, you should group the frequently accessed columns into the same partition.
- If you find that autocompression is effective for a column, consider placing that column in a single-column partition even if it is not frequently accessed.
- Autocompression is most effective for single-column partitions with COLUMN format, less so for multicolumn partitions, particularly as the number of columns increases, and not effective for column partitions with ROW format.
- Group columns into a column partition for applications where either of the following is true.
- Queries frequently access the columns.
- Queries do not frequently access the columns, and autocompression of the individual columns or subsets of columns is not effective.
- Use COLUMN format for narrow column partitions, especially if you find that autocompression is effective for a partition.
If the system-determined format is not COLUMN for a column partition, but you determine that COLUMN is more appropriate, specify COLUMN explicitly when you create or alter the table.
You might need to specify COLUMN format explicitly for a column partition that has a column with a VARCHAR, CHARACTER SET VARGRAPHIC, or VARBYTE data type defined with a large maximum value, but where the partition values are actually relatively short in most cases.
This can happen when the system-determined format is ROW because of a large maximum value length.
- Use ROW format for wide column partitions because it has less overhead than a container that holds only 1 or a few values.
If the system-determined format is not ROW for a column partition, but you determine that ROW is more appropriate, specify ROW explicitly when you create or alter the table.
- The Optimizer uses the DBS Control parameter PPICacheThrP to determine the number of available file contexts that can be used at a time to access a partitioned table.
IF PPICacheThrP determines the number of available file contexts to be … THEN Vantage considers this many file contexts to be available … < 8 8 > 256 256 Vantage uses the number of file contexts as the number of available column partition contexts for a column-partitioned table.
Vantage might associate a file context with each column partition context for some operations, and in other cases it might allocate a buffer with each column partition context.Ideally, the number of column partition contexts should be at least equal to the number of column partitions that need to be accessed by a request. Otherwise, performance can degrade because the required column partitions cannot be read at one time.
Performance and memory usage can be impacted if PPICacheThrP is set too high, which can lead to memory thrashing or a system crash. At the same time, the benefits of partitioning can be lessened if the value for the DBS Control parameter PPICacheThrP is set unnecessarily low, causing performance to degrade significantly.
The default is expected to be applicable to most workloads, but you might need to make adjustments to get the best balance.
- You should accept the default DATABLOCKSIZE or the default data block size as defined by the DBS Control parameter PermDBSize for a column-partitioned table unless performance analysis indicates otherwise.
- If you expect to add rows to a table incrementally, you should consider allocating some additional free space if a column-partitioned table has small internal partitions. This can occur, for example, if a table is also row-partitioned.
To allocate additional space, you can either specify a value for the FREESPACE option that is larger than the system default when you create the table, or accept the default as set by the DBS Control parameter FreeSpacePercent if your DBA has set it to a higher default value. For more information about FreeSpacePercent, see Teradata Vantage™ - Database Utilities, B035-1102.
If you plan to load the table with rows using a large INSERT … SELECT statement, and the internal partitions for the table are either large or unpopulated, little or no free space should be required.
The reserved free space enables table data to expand on current table cylinders, preventing or delaying the need for additional table cylinders to be allocated, which prevents or delays data migration associated with new cylinder allocations.
- Keeping new table data physically close to existing table data, and avoiding data migrations, can improve overall system performance.
The following usage guidelines apply to column-partitioned table applications.