Anticipated Workload Characteristics for Column-Partitioned Tables and Join Indexes

Anticipated Workload Characteristics for Column-Partitioned Tables and Join Indexes - Teradata Vantage

Teradata® VantageCloud Lake

Deployment

VantageCloud

Edition

Lake

Product

Teradata Vantage

Published

January 2023

Language

English (United States)

Last Update

2024-04-03

dita:mapPath

phg1621910019905.ditamap

dita:ditavalPath

pny1626732985837.ditaval

dita:id

phg1621910019905

The expected scenarios for column-partitioned tables and join indexes are those where the partitioned table data is loaded with an INSERT ... SELECT request with possibly minor ongoing maintenance, and then the table is used to run analytics and data mining, when the table or row partitions are deleted, and the scenario then begins again. This is called an insert-once scenario. Column-partitioned tables are not intended to be used for OLTP- or tactical query-type activities.

The design point for data maintenance of column-partitioned database objects is roughly 88% INSERT ... SELECT or array INSERT operations into empty table or row partitions, 2% other inserts operations, 7% update operations, and 3% delete operations.

Most requests that access a column-partitioned table or join index are expected to be selective on a variable subset of columns, or to project a variable subset of the columns, where the subset accesses fewer than 10% of the column partitions for any request.

If the expected number of column partitions that must be accessed for requests exceeds the number of available column partition contexts, as in the following EXPLAIN of a SELECT request, there may be negative impact on performance.

The 21 column partitions includes the 20 selected partitions in the SELECT statement and the delete column partition from the column-partitioned table. 21 exceeds the number of available column partition contexts, so 20 column partitions (including the delete column partition) of the column-partitioned table are merged into the first subrow column partition of the column-partitioned merge spool. The remaining column partition that must be accessed from the column-partitioned table is copied to the second subrow column partition in the column-partitioned merge spool (for a total of 2 subrow column partitions).

This reduces the number of column partitions to be accessed at one time (which is limited by the number of available column-partition contexts). The result is then retrieved from the two subrow column partitions of the column-partitioned merge spool.

EXPLAIN SELECT a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, r,
               s, t, u
FROM /*CP*/ t1;

*** Help information returned. 13 rows.
*** Total elapsed time was 1 second.

Explanation
---------------------------------------------------------------------------
1) First, we lock PLS.t1 for read on a reserved RowHash to prevent global deadlock.
2) Next, we do an all-AMPs RETRIEVE step from 21 column partitions (20
   contexts) of PLS.t1 using covering CP merge Spool 2 (2 subrow
   partitions and Last Use) by way of an all-rows scan with no residual
   conditions into Spool 1 (all_amps), which is built locally on the
   AMPs. The size of Spool 1 is estimated with low confidence to be
   2 rows (614 bytes). The estimated time for this step is 0.03 seconds.
3) Finally, we send out an END TRANSACTION step to all AMPs involved
   in processing the request.
   -> The contents of Spool 1 are sent back to the user as the result of
   statement 1. The total estimated time is 0.03 seconds.

A column-partitioned table can be used as a sandbox table where data can be added until an appropriate indexing method is determined. Requests that access a small, but variable, subset of the columns may run more efficiently against a column-partitioned table compared to a table without column partitioning.