15.00 - Column-Partitioned Tables - Teradata Database

Teradata Database SQL Data Definition Language Detailed Topics

prodname
Teradata Database
vrm_release
15.00
category
Programming Reference
featnum
B035-1184-015K

Column‑Partitioned Tables

The term Teradata Columnar describes NoPI tables that are column‑partitioned. More generally, such tables are referred to as column‑partitioned tables. Teradata Columnar is disabled by default and it can only be enabled by Teradata support personnel.

Teradata Columnar partitioning is a physical database design implementation that consists of an integrated set of options that support column partitioning, columnar storage, autocompression, and other capabilities that you can specify for column‑partitioned tables. Columnar storage packs the data values of a column partition into a series of containers, which significantly reduces the number of row headers required to store the data.

You can store data in a column partition using either traditional row storage (indicated by specifying the ROW keyword) or columnar storage (indicated by specifying the COLUMN keyword), or a mix of both.

You can set the cost profile constant AutoCompressDefault to determine whether the default autocompression for column partitions is AUTO COMPRESS or NO AUTO COMPRESS. For more information about the AutoCompressDefault cost profile constant, see SQL Request and Transaction Processing.

Teradata Replication Services does not support column‑partitioned tables.

You can define all of the following features for column-partitioned tables.

  • TRANSACTIONTIME columns, VALIDTIME columns, or both.
  • Fallback
  • Secondary indexes
  • Join indexes
  • UNIQUE column constraints
  • CHECK constraints
  • PRIMARY KEY and FOREIGN KEY constraints
  • Identity columns
  • AUTO COMPRESS or NO AUTOCOMPRESS in the column definition list.
  • AUTO COMPRESS or NO AUTOCOMPRESS in the PARTITION BY clause.
  • Triggers
  • BLOB, CLOB, XML, ARRAY, VARRAY, UDT, Period, Geospatial, and row‑level security constraint columns.
  • Note: There is a limit of approximately 64K rows per row hash value for LOBs. Because there is normally only 1 row hash value per AMP for column-partitioned tables, there is also a limit of approximately 64K rows per AMP for column-partitioned tables that contain columns typed as BLOBs or CLOBs.

    You cannot define permanent journaling for column‑partitioned tables.

    You cannot define any of the following table types as column-partitioned tables.

  • Global temporary tables
  • Volatile tables
  • Global temporary trace tables
  • Queue tables
  • Error tables
  • The default syntax for Teradata Columnar incorporates the following features.

  • The default index definition clause specification for column‑partitioned tables is NO PRIMARY INDEX.
  • The setting of the DBS Control parameter PrimaryIndexDefault does not affect this default.

  • Teradata Database implicitly assigns each individual column to its own column partition.
  • While this is the default method of storing column partition values, you can also explicitly group multiple columns into a single container.

  • Teradata Database determines whether column partitions are stored in row or columnar format.
  • While this is the default format for storing column partition values, you can also explicitly specify either COLUMN or ROW format for a partition.

    You can further define multiple columns to be assigned to the same column partition either in the column list for the table or in its partitioning expression. You do this by delimiting columns to be grouped into the same column partition between parenthesis characters.

  • Teradata Database implicitly detects and applies a number of different autocompression methods to enable more efficient storage. Autocompression methods are only applied if they reduce the size of the physical row.
  • Note: This is the default compression option. You can also explicitly specify NO AUTO COMPRESS if you do not want Teradata Database to implicitly assign autocompression methods to your column partitions.

    See “CREATE TABLE” in SQL Data Definition Language Syntax and Examples for details of the syntax used to create a column‑partitioned table. Columnar storage packs the values of a column partition into a series of containers (column partitions stored using COLUMN format) or subrows (column partitions stored using ROW format), significantly reducing the number of row headers that would otherwise be required per stored container.

    Teradata Columnar partitioning is optimal for 2 purposes.

  • As staging tables for FastLoad and Teradata Parallel Data Pump array INSERT load operations (see Database Design for details).
  • Once the data has been loaded into these tables, you can use SQL requests such as INSERT … SELECT, MERGE, or UPDATE … FROM to copy the rows to their destination primary‑indexed tables.

    You can also use INSERT … SELECT and UPDATE requests to copy rows from a primary‑indexed source table into a column‑partitioned target table, while you cannot use MERGE requests to do the same.

  • As sandbox tables when an appropriate primary index has not yet been defined for the table the rows will eventually populate.
  • The expected usage for a column-partitioned table is the case where its partitions are loaded using an INSERT … SELECT request (possibly followed by intermittent minor maintenance) and then used to run data mining analytics.

    Continuing with this scenario, after the analytical work or data mining has been completed, the expectation is that you will either delete the entire table or that you will delete individual row partitions from it.

    Following that, the expectation is that the rows will be copied to a primary‑indexed table and the identical scenario will repeat with new data inserted into the column‑partitioned table.

    This type of scenario is referred to as an insert once scenario. Column‑partitioned tables are not intended to be used for OLTP activities.

    See “Usage Guidelines for Column‑Partitioned Tables” on page 580, “Rules and Restrictions for Column‑Partitioned Tables” on page 583, and “Performance Issues for Column‑Partitioned Tables” on page 594 for more information about how to optimize your use of column‑partitioned tables.

    You can partition a column‑partitioned table in 3 different ways.

  • By column
  • By row
  • By a combination of columns and rows
  • You must use multilevel partitioning to combine column‑partitioning and row‑partitioning for a table because you cannot mix column and row partitioning in the same partitioning level.

    Column partitioning enables the Optimizer to develop efficient search plans by using column partition elimination based on the columns that are required to process a query. If a column is not needed by a request, the column partition containing that column does not need to be read, significantly enhancing query performance.

    Column partitioning also enables DML requests to efficiently access selected data from column partitions, significantly reducing query I/O.

    You can use the following SQL DML statements to manipulate column‑partitioned table data.

  • DELETE
  • INSERT
  • SELECT
  • UPDATE
  • Note: The MERGE statement is not included in this list. You cannot use MERGE requests to insert rows into a column‑partitioned table nor can you use MERGE requests to update the data in a column‑partitioned table. This is because you must specify the primary index of the target table when you use a MERGE request to insert or update the data of a table, and column‑partitioned tables do not have a primary index.