15.00 - Anticipated Workload Characteristics for Column-Partitioned Tables and Join Indexes - Teradata Database

Teradata Database Design

Teradata Database
User Guide

Anticipated Workload Characteristics for Column‑Partitioned Tables and Join Indexes

The expected scenarios for column-partitioned tables and join indexes are those where the partitioned table data is loaded with an INSERT … SELECT request with possibly some minor ongoing maintenance, and then the table is used to run analytics and data mining, at which point the table or row partitions are deleted, and the scenario then begins over again. This is referred to as an insert once scenario. Column-partitioned tables are not intended to be used for OLTP‑ or tactical query‑type activities.

The design point for data maintenance of column‑partitioned database objects is roughly 88% INSERT … SELECT or array INSERT operations into empty table or row partitions, 2% other inserts operations, 7% update operations, and 3% delete operations.

Most requests that access a column‑partitioned table or join index are expected to be selective on a variable subset of columns, or to project a variable subset of the columns, where the subset accesses fewer than 10% of the column partitions for any particular request.

The expected number of column partitions that need to be accessed for requests should preferably not exceed the number of available column partition contexts as it does in the following EXPLAIN of a SELECT request. If it does, there may be undesirable negative impact on performance in some cases.

Note that the count of 21 for the number of column partitions includes the 20 selected partitions in the SELECT statement and the delete column partition from the column‑partitioned table. 21 exceeds the number of available column partition contexts, so 20 column partitions (including the delete column partition) of the column‑partitioned table are merged into the first subrow column partition of the column‑partitioned merge spool. The remaining column partition that needs to be accessed from the column‑partitioned table is copied to the second subrow column partition in the column‑partitioned merge spool (for a total of 2 subrow column partitions). This reduces the number of column partitions to be accessed at one time (which is limited by the number of available column-partition contexts). The result is then retrieved from the two subrow column partitions of the column‑partitioned merge spool.

     EXPLAIN SELECT a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, r, 
                    s, t, u
     FROM /*CP*/ t1;
 *** Help information returned. 13 rows.
 *** Total elapsed time was 1 second.
  1) First, we lock a distinct PLS."pseudo table" for read on a RowHash
     to prevent global deadlock for PLS.t1.
  2) Next, we lock PLS.t1 for read.
  3) We do an all-AMPs RETRIEVE step from 21 column partitions (20 
     contexts) of PLS.t1 using covering CP merge Spool 2 (2 subrow 
     partitions and Last Use) by way of an all-rows scan with no residual
     conditions into Spool 1 (all_amps), which is built locally on the
     AMPs.  The size of Spool 1 is estimated with low confidence to be
     2 rows (614 bytes).  The estimated time for this step is 0.03 seconds.
  4) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.03 seconds.

A column‑partitioned table can be used as a sandbox table where data can be added until an appropriate indexing method is determined. Requests that access a small, but variable, subset of the columns might run more efficiently against a column‑partitioned table compared to a NoPI table without column partitioning.