Storage and Other Overhead Considerations for Partitioning - Advanced SQL Engine - Teradata Database

Database Design

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Published
July 2021
Language
English (United States)
Last Update
2021-07-27
dita:mapPath
kko1591750222108.ditamap
dita:ditavalPath
kko1591750222108.ditaval
dita:id
B035-1094
lifecycle
previous
Product Category
Teradata Vantageā„¢

Column Partitions With COLUMN Format

A column partition with COLUMN format packs column partition values into a physical row, or container, up to a system-determined limit. The column partition values must be in the same combined partition to be packed into a container.

The row header occurs once for a container instead of there being a row header for each column partition value.

The format for the row header is either 14 or 20 bytes and consists of the following fields.
  • Length
  • rowID

    The rowID of the first column partition value is the rowID of the container.

  • Flag byte
  • First presence byte

The rowID of a column partition value can be determined by its position within the container. If many column partition values can be packed into a container, this row header compression can greatly reduce the space needed for a column-partitioned object compared to the same object without column partitioning.

If Vantage can only place a few column partition values in a container because of their width, there can actually be an increase in the space needed for a column-partitioned object compared to the object without column partitioning. In this case, ROW format might be more appropriate.

If Vantage can only place a few column partition values because the row partitioning is such that only a few column partition values occur for each combined partition, there can be a very large increase in the space needed for a column-partitioned object compared to the object without column partitioning. In the worst case, the space required for the column-partitioned object can be as much as 24 times larger.

In this case, consider altering the row partitioning to allow for more column partition values per combined partition or removing column partitioning.

If the container has autocompression, 2 bytes are used as an offset to compression bits, 1 or more bytes indicate the autocompression types and their arguments, if any, for the container, 1 or bytes, depending on the number of column partition values, of autocompression bits, 0 or more bytes are used for a local value-list dictionary depending the autocompression type, and 0 or more bytes are used for present column partition values.

If the container does not have autocompression either because you specified NO AUTO COMPRESS for the column partition or because no autocompression types are applicable for the column partition values of the container, Vantage uses 0 or more bytes for column partition values.

The byte length of a container is rounded up to a multiple of 8.

The formats for single-column and multicolumn partitions with COLUMN format differ slightly, as listed by the following bullets.
  • A single-column partition with COLUMN format consists only of single-column containers. Each container represents a series of column values.
  • A multicolumn partition with COLUMN format consists of multicolumn containers. Each container represents a series of column partition values where the column partition value is made up of a set of values, one for each column in the partition.

See Row Structure for Containers (COLUMN Format) for more information.

Column Partitions With ROW Format

A physical row of a column partition with ROW format is called a subrow. Subrows have the same format as traditional database rows, but only include the columns defined for the column partition. Note that currently there is no autocompression for subrows.

If Vantage uses ROW format for the column partitions of a column-partitioned object is such that many narrow column partition values occur, there can be a very large increase in the space needed for a column-partitioned object compared to the object without column partitioning. In the worst case, the space required for the column-partitioned object can be as much as 24 times larger.

Because of this, you should not specify ROW format for narrow column partitions. In the worst case, each column partition value can have a row header (14 or 20 bytes plus 6 or more bytes needed for a container) for each column partition value as compared to the same object without column partitioning. An object that is not column-partitioned only has a row header (14 or 20 bytes) for each regular row.

ROW format is useful for wide column partitions where one or only a few values fit in a container, and there is neither much benefit nor a negative impact from row header compression. ROW format provides quicker and more direct access to a specific column partition value than with COLUMN format because when a container has COLUMN format, Vantage must locate the column partition value within the container.

Depending on the autocompression types used for a container, access can be as simple as indexing into the container or it might require a sequential access through bits indicating how a value is compressed or sequential access through the column partition values, or both, to position to the specific column partition value to be accessed.

See Row Structure for Subrows (ROW Format) for more information.