16.20 - Row Compression of Join Indexes - Teradata Database - Teradata Vantage NewSQL Engine

Teradata Vantage™ SQL Data Definition Language Detailed Topics

Product
Teradata Database
Teradata Vantage NewSQL Engine
Release Number
16.20
Release Date
March 2019
Content Type
Programming Reference
Publication ID
B035-1184-162K
Language
English (United States)
The term compression has multiple meanings for the Teradata system. Multivalue compression and row compression forms are lossless, meaning that the original data can be reconstructed exactly from their compressed forms.
  • The storage organization for join indexes supports a compressed format to reduce storage space.

    If you know that a join index contains groups of rows with repeating information, then its definition DDL can specify repeating groups, indicating the repeating columns within a separate set of parentheses from the fixed columns. In other words, the column list is specified as two groups of columns, with each group delimited within parentheses. The first group contains the repeating columns and the second group contains the fixed columns.

    You cannot define both row compression and a row-partitioned primary index for the same join index.

    When describing compression of join index rows, compression refers to a logical row compression in which multiple sets of non-repeating column values are appended to a single set of repeating column values. Teradata Database stores the repeating value set only once, while any non-repeating column values are stored as logical segmental extensions of the base repeating set.

    For example, suppose you have a join index ji_no_comp that has five rows. The join index table might look something like this.

          column_1       column_2       column_3       column_4       column_5
                   1                2                0                2                4
                   2                4                8                7                7
                   1                2                2                3                1
                   1                2                5                3                3
                   2                4                3                4                6

    When you examine the rows of this join index, you notice several repetitions of the value 1 in column_1 paired with the value 2 in column_2. These repeating pairs are highlighted in teal. You also notice that there are several repetitions of the value 2 in column_1 paired with a 4 in column_2. These repeating pairs are highlighted in tan. None of the values for column_3, column_4, or column_5 repeat.

    By defining a new compressed join index for these rows, you can reduce the number of rows stored from five down to two. To do this, you would define the index something like this.

         CREATE JOIN INDEX ji_comp AS
           SELECT (column_1, column_2), (column_3, column_4, column_5)
           FROM table_zee
         PRIMARY INDEX (column_3);

    Because of the way the columns are parenthetically grouped in this request, Teradata Database knows that the first group of columns represents a column set whose values repeat, while the second group of columns represents a column set whose values do not repeat.

    You can think of the way Teradata Database stores the data from the original 5 rows as being something like the following 2 rows, reducing the storage space required for the index from five rows to two. Join index ji_comp uses less disk space than ji_no_comp, but logically represents the same 5 rows. In this representation the repeating column values are highlighted in teal and their accompanying fixed column values are highlighted in tan:

        (1,2)   (0,2,4)   (2,3,1)   (5,3,3)
        (2,4)   (8,7,7)   (3,4,6)  

    For more information about the syntax you must use to specify join index row compression, see Teradata Vantage™ SQL Data Definition Language Syntax and Examples, B035-1144.

  • When describing compression of column values, compression refers either to multivalue compression, which is the storage of column values one time only in the table header, not in the row itself, and pointing to them by means of an array of presence bits in the row header, or algorithmic compression, which is a user-defined column multivalue compression method using UDFs. See Compression of Join Index Column Values. For information about creating algorithmic compression UDFs, see Teradata Vantage™ SQL External Routine Programming , B035-1147 .

    The method Teradata Database uses for multivalue compression is called Dictionary Indexing, and it can be viewed as a variation of Run-Length Encoding. For additional information, see Teradata Vantage™ - Database Design, B035-1094.

For more information about join index row compression, see Physical Join Index Row Compression.