Column-Partitioned Tables - Advanced SQL Engine

Column-Partitioned Tables - Advanced SQL Engine - Teradata Database

SQL Data Definition Language Detailed Topics

Product

Advanced SQL Engine

Teradata Database

Release Number

17.10

Published

July 2021

Language

English (United States)

Last Update

2021-07-27

dita:mapPath

imq1591724555718.ditamap

dita:ditavalPath

imq1591724555718.ditaval

dita:id

B035-1184

lifecycle

Product Category

Teradata Vantage™

The term Teradata Columnar describes tables with a Primary Index (PI), Primary AMP Index PA), or No Primary Index (NoPI) that are column-partitioned. These tables are referred to as column-partitioned tables. Teradata Columnar is disabled by default and it can only be enabled by Teradata Services personnel.

Teradata Columnar partitioning is a physical database design implementation that consists of an integrated set of options that support column partitioning, columnar storage, autocompression, and other capabilities that you can specify for column-partitioned tables. Columnar storage packs the data values of a column partition into a series of containers, which significantly reduces the number of row headers required to store the data.

You can store data in a column partition using either traditional row storage (indicated by the ROW keyword) or columnar storage (indicated by the COLUMN keyword), or a mix of both.

You can define the following features for column-partitioned tables.

TRANSACTIONTIME columns, VALIDTIME columns, or both.
Fallback
Secondary indexes
Join indexes
UNIQUE column constraints
CHECK constraints
PRIMARY KEY and FOREIGN KEY constraints
Identity columns
AUTO COMPRESS or NO AUTOCOMPRESS in the column definition list.
AUTO COMPRESS or NO AUTOCOMPRESS in the PARTITION BY clause.
Triggers
BLOB, CLOB, XML, ARRAY, VARRAY, UDT, Period, Geospatial, and row-level security constraint columns.
There is a limit of approximately 64K rows per row hash value for LOBs. Because there is normally only 1 row hash value per AMP for column-partitioned tables, there is also a limit of approximately 64K rows per AMP for column-partitioned tables that contain columns typed as BLOBs or CLOBs.

You cannot define permanent journaling for column-partitioned tables.

You cannot define any of the following table types as column-partitioned tables.

Global temporary tables
Volatile tables
Global temporary trace tables
Queue tables
Error tables

The default syntax for Teradata Columnar incorporates the following features.

The default index definition clause specification for column-partitioned tables is NO PRIMARY INDEX.
The setting of the DBS Control parameter PrimaryIndexDefault does not affect this default.
Vantage implicitly assigns each individual column to its own column partition.
While this is the default method of storing column partition values, you can also explicitly group multiple columns into a column partition.
Vantage determines whether column partitions are stored in row or column format.
While this is the default format for storing column partition values, you can also explicitly specify either COLUMN or ROW format for a partition.

You can further define multiple columns to be assigned to the same column partition either in the column list for the table or in its partitioning expression. You do this by delimiting columns to be grouped into the same column partition between parenthesis characters.
Vantage implicitly detects and applies a number of different autocompression methods to enable more efficient storage. Autocompression methods are only applied if they reduce the size of the physical row.
This is the default compression option. You can also explicitly specify NO AUTO COMPRESS if you do not want Vantage to implicitly assign autocompression methods to your column partitions.

See CREATE TABLE in Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144 for details of the syntax used to create a column-partitioned table. Column storage packs the values of a column partition into a series of containers (column partitions stored using COLUMN format) or subrows (column partitions stored using ROW format). COLUMN format significantly reduces the number of row headers that would otherwise be required per stored container.

Teradata Columnar is optimal for:

Improving the I/O performance of accessing a subset of the columns from a table either for evaluating predicates or projections. Because sets of one or more columns can be stored in separate column partitions, only the column partitions that contain the columns referenced by the query need to be accessed from storage thereby significantly reducing the number of data blocks that are read.
Sandbox tables when an appropriate primary index has not yet been defined for the table the rows will eventually populate.

The expected usage for a column-partitioned table is the case where its partitions are loaded using an INSERT … SELECT statement (possibly followed by intermittent minor maintenance) and then used to run data mining analytics.

After the analytical work or data mining has been completed, the expectation is that you will either delete the entire table or that you will delete individual row partitions from it.

Following that, the expectation is that the rows will be loaded into a staging table and the identical scenario will repeat with new data inserted into the column-partitioned table.

This type of scenario is referred to as an insert once scenario. Column-partitioned tables are not intended to be used for OLTP activities.

For more information about how to optimize your use of column-partitioned tables, see Usage Guidelines for Column-Partitioned Tables, Rules and Restrictions for Column-Partitioned Tables, and Performance Issues for Column-Partitioned Tables.

You can partition a column-partitioned table in 3 different ways.

By column
By row
By a combination of columns and rows using multilevel partitioning

Column partitioning enables column partition elimination based on the columns that are required to process a query. If a column is not needed by a request, the column partition containing that column does not need to be read, significantly enhancing query performance.

Column partitioning also enables DML requests to efficiently access selected data from column partitions, significantly reducing query I/O.

You can use the following SQL DML statements to manipulate column-partitioned table data.

DELETE
INSERT
SELECT
UPDATE

The MERGE statement is not included in this list. You cannot use MERGE statements to insert rows into a column-partitioned table nor can you use MERGE statements to update the data in a column-partitioned table. This is because you must specify the primary index of the target table when you use a MERGE statement to insert or update the data of a table, and column-partitioned tables do not have a primary index.