15.00 - Row Compression of Join Indexes - Teradata Database

Teradata Database SQL Data Definition Language Detailed Topics

prodname
Teradata Database
vrm_release
15.00
category
Programming Reference
featnum
B035-1184-015K

Row Compression of Join Indexes

The term compression is used to mean several entirely different things for the Teradata system. Both the multi-value compression and row compression forms are lossless, meaning that the original data can be reconstructed exactly from their compressed forms.

  • The storage organization for join indexes supports a compressed format to reduce storage space.
  • If you know that a join index contains groups of rows with repeating information, then its definition DDL can specify repeating groups, indicating the repeating columns within a separate set of parentheses from the fixed columns. In other words, the column list is specified as two groups of columns, with each group delimited within parentheses. The first group contains the repeating columns and the second group contains the fixed columns.

    Note that you cannot define both row compression and a row‑partitioned primary index for the same join index.

    When describing compression of join index rows, compression refers to a logical row compression in which multiple sets of non‑repeating column values are appended to a single set of repeating column values. This allows Teradata Database to store the repeating value set only once, while any non‑repeating column values are stored as logical segmental extensions of the base repeating set.

    For example, suppose you have a join index ji_no_comp that has five rows. The join index table might look something like this.

     

          column_1

          column_2

          column_3

          column_4

          column_5

                   1

                   2

                   0

                   2

                   4

                   2

                   4

                   8

                   7

                   7

                   1

                   2

                   2

                   3

                   1

                   1

                   2

                   5

                   3

                   3

                   2

                   4

                   3

                   4

                   6

    When you examine the rows of this join index, you notice several repetitions of the value 1 in column_1 paired with the value 2 in column_2. These repeating pairs are highlighted in blue. You also notice that there are several repetitions of the value 2 in column_1 paired with a 4 in column_2. These repeating pairs are highlighted in orange. None of the values for column_3, column_4, or column_5 repeat.

    By defining a new compressed join index for these rows, you can reduce the number of rows stored from five down to two. To do this, you would define the index something like this.

         CREATE JOIN INDEX ji_comp AS
           SELECT (column_1, column_2), (column_3, column_4, column_5)
           FROM table_zee
         PRIMARY INDEX (column_3);

    Because of the way the columns are parenthetically grouped in this request, Teradata Database knows that the first group of columns represents a column set whose values repeat, while the second group of columns represents a column set whose values do not repeat.

    You can think of the way Teradata Database stores the data from the original 5 rows as being something like the following 2 rows, reducing the storage space required for the index from five rows to two. Join index ji_comp uses less disk space than ji_no_comp, but logically represents the same 5 rows. In this representation the repeating column values are highlighted in teal and their accompanying fixed column values are highlighted in purple.

     

    See SQL Data Definition Language Syntax and Examples for more information about the syntax you must use to specify join index row compression.

  • When describing compression of column values (see “Compression of Join Index Column Values” on page 379), compression refers either to multi-value compression, which is the storage of column values one time only in the table header, not in the row itself, and pointing to them by means of an array of presence bits in the row header, or algorithmic compression, which is a user‑defined column multi-value compression method using UDFs (see SQL External Routine Programming for information about how to create algorithmic compression UDFs).
  • The method Teradata Database uses for multi-value compression is called Dictionary Indexing, and it can be viewed as a variation of Run‑Length Encoding (see Database Design for additional information).

    See “Physical Join Index Row Compression” on page 381 for more information about join index row compression.