The term Teradata Columnar describes tables with a Primary Index (PI), Primary AMP Index PA), or No Primary Index (NoPI) that are column-partitioned. These tables are referred to as column-partitioned tables. Teradata Columnar is disabled by default and it can only be enabled by Teradata support personnel.
Teradata Columnar partitioning is a physical database design implementation that consists of an integrated set of options that support column partitioning, columnar storage, autocompression, and other capabilities that you can specify for column-partitioned tables. Columnar storage packs the data values of a column partition into a series of containers, which significantly reduces the number of row headers required to store the data.
You can store data in a column partition using either traditional row storage (indicated by the ROW keyword) or columnar storage (indicated by the COLUMN keyword), or a mix of both.
- TRANSACTIONTIME columns, VALIDTIME columns, or both.
- Secondary indexes
- Join indexes
- UNIQUE column constraints
- CHECK constraints
- PRIMARY KEY and FOREIGN KEY constraints
- Identity columns
- AUTO COMPRESS or NO AUTOCOMPRESS in the column definition list.
- AUTO COMPRESS or NO AUTOCOMPRESS in the PARTITION BY clause.
- BLOB, CLOB, XML, ARRAY, VARRAY, UDT, Period, Geospatial, and row-level security constraint columns.There is a limit of approximately 64K rows per row hash value for LOBs. Because there is normally only 1 row hash value per AMP for column-partitioned tables, there is also a limit of approximately 64K rows per AMP for column-partitioned tables that contain columns typed as BLOBs or CLOBs.
You cannot define permanent journaling for column-partitioned tables.
- Global temporary tables
- Volatile tables
- Global temporary trace tables
- Queue tables
- Error tables
- The default index definition clause specification for column-partitioned tables is NO PRIMARY INDEX.
The setting of the DBS Control parameter PrimaryIndexDefault does not affect this default.
- Teradata Database implicitly assigns each individual column to its own column partition.
While this is the default method of storing column partition values, you can also explicitly group multiple columns into a column partition.
- Teradata Database determines whether column partitions are stored in row or column format.
While this is the default format for storing column partition values, you can also explicitly specify either COLUMN or ROW format for a partition.
You can further define multiple columns to be assigned to the same column partition either in the column list for the table or in its partitioning expression. You do this by delimiting columns to be grouped into the same column partition between parenthesis characters.
- Teradata Database implicitly detects and applies a number of different autocompression methods to enable more efficient storage. Autocompression methods are only applied if they reduce the size of the physical row.This is the default compression option. You can also explicitly specify NO AUTO COMPRESS if you do not want Teradata Database to implicitly assign autocompression methods to your column partitions.
See “CREATE TABLE” in Teradata Vantage™ SQL Data Definition Language Syntax and Examples, B035-1144 for details of the syntax used to create a column-partitioned table. Column storage packs the values of a column partition into a series of containers (column partitions stored using COLUMN format) or subrows (column partitions stored using ROW format). COLUMN format significantly reduces the number of row headers that would otherwise be required per stored container.
- Improving the I/O performance of accessing a subset of the columns from a table either for evaluating predicates or projections. Because sets of one or more columns can be stored in separate column partitions, only the column partitions that contain the columns referenced by the query need to be accessed from storage thereby significantly reducing the number of data blocks that are read.
- Sandbox tables when an appropriate primary index has not yet been defined for the table the rows will eventually populate.
The expected usage for a column-partitioned table is the case where its partitions are loaded using an INSERT … SELECT statement (possibly followed by intermittent minor maintenance) and then used to run data mining analytics.
After the analytical work or data mining has been completed, the expectation is that you will either delete the entire table or that you will delete individual row partitions from it.
Following that, the expectation is that the rows will be loaded into a staging table and the identical scenario will repeat with new data inserted into the column-partitioned table.
This type of scenario is referred to as an insert once scenario. Column-partitioned tables are not intended to be used for OLTP activities.
For more information about how to optimize your use of column-partitioned tables, see Usage Guidelines for Column-Partitioned Tables, Rules and Restrictions for Column-Partitioned Tables, and Performance Issues for Column-Partitioned Tables.
- By column
- By row
- By a combination of columns and rows using multilevel partitioning
Column partitioning enables column partition elimination based on the columns that are required to process a query. If a column is not needed by a request, the column partition containing that column does not need to be read, significantly enhancing query performance.
Column partitioning also enables DML requests to efficiently access selected data from column partitions, significantly reducing query I/O.