15.00 - General Row Structure When Compressing Variable Length Columns - Teradata Database

Teradata Database Design

prodname
Teradata Database
vrm_release
15.00
category
User Guide
featnum
B035-1094-015K

General Row Structure When Compressing Variable Length Columns

The following information describes aspects of general row structure when one or more variable length columns in a table are compressed. The description applies to columns having any of the following data types:

  • VARBYTE
  • VARCHAR
  • VARCHAR(n) CHARACTER SET GRAPHIC
  • UDT
  • CLOB/BLOB
  • The description assumes that all variable length compressible columns are treated as an extension of fixed length compressible columns except that decompressed variable length columns store both a length and the actual column data. There is an additional presence bit for an algorithmically compressed column to indicate whether the column data is compressed or not. Teradata Database sets this bit only when data in a column is compressed using algorithmic compression.

    When column data is compressed algorithmically, Teradata Database stores it as length: data pairs interleaved with the other compressible columns in the table. When column data is null, Teradata Database does not store a length, so there is no overhead in that case.

    The following summary information applies to general row structure elements for the storage of variable length column data for multi-value compression.

  • When compression does not apply to a value in a column that specifies multi-value compression, Teradata Database stores the column data in the row as a length and data value pair.
  • If the length of the column data is <= 255 bytes, then Teradata Database stores the length in 1 byte; otherwise it stores the length in 2 bytes.

    A variable length column (without compression) always has a 2-byte offset associated with it, whereas a compressed variable length column can have its length stored in one or two bytes, depending on whether the column length is <= 255 or not, respectively.

  • Teradata Database stores uncompressed variable length data in the row as a length and data value pair.
  • The stored length for multi-value compressed data is the length of the uncompressed column value.

  • If a variable or fixed length multi-value compressed column is compressed or is null, then its values are not stored in the row.
  • Instead, Teradata Database stores one instance of each compressed value within a column in the table header and references it using the presence bits array for the row.

  • If a fixed length multi-value compressed column is not compressed, then Teradata Database stores its values in the row as data, but without an accompanying length. No length is needed because Teradata Database pads fixed length data values to the maximum length defined for their containing column.
  • The following summary information applies to general row structure elements for the storage of variable length column data for algorithmic compression.

  • If a variable length column is not compressed, then Teradata Database stores its values in the row as a length and data value pair.
  • If a fixed length column is not compressed, then Teradata Database stores its values in the row as data, but without an accompanying length.
  • Teradata Database pads fixed length data values to the maximum length defined for their containing column.

  • For an algorithmically‑compressed column with a variable length, Teradata Database stores its values in the row as a length and data value pair.
  • The stored length for algorithmically‑compressed data is the length of the compressed column value, not its original, uncompressed length.

    If the length of the column data is 255 bytes, then Teradata Database stores the length in 1 byte; otherwise, it stores the length in 2 bytes.

  • For an algorithmically‑compressed column with either a variable length or a fixed length, Teradata Database does not store a representation of its data in the row if the column is null, but does indicate its existence in the data using the presence bit array for the row.
  • The following sets of examples demonstrate the specific effects on row structure for multi-value compression alone (cases 1, 2, and 3 of example 1), algorithmic compression alone (cases 1, 2, and 3 of example 2), and combined multi-value and algorithmic compression (cases 1, 2, and 3 of example 3). In each case, the row structure diagram is based on the structure for a PPI table (see “Packed64 Row Structure for a Partitioned Table” on page 754) even though the examples are all for nonpartitioned primary index tables.

    Example 1: Algorithmic Compression But No Multi-Value Compression

    Suppose you define the following table to uses Huffman encoding algorithms to compress and decompress two of its columns algorithmically. This table specifies algorithmic compression on columns alc1 and alc2, but no multi-value compression. Particularly relevant data for each case is highlighted in red boldface type.

         CREATE TABLE t1 (
           fc1  INTEGER,
           alc1 VARCHAR(10) COMPRESS ALGCOMPRESS huffcomp
                                     ALGDECOMPRESS huffdecomp,
           vc1  VARCHAR(20),
           alc2 VARCHAR(10) COMPRESS ALGCOMPRESS huffcomp
                                     ALGDECOMPRESS huffdecomp,
           vc2  VARCHAR(20) );

    Field 5 of the table header for this table has the following field descriptors.

                             UDT or
                  Num-  Next Com-                   Pres-
    In- Fld  Off- Comp- Fld  press  Calc   Sto-     ence     Sort
    dex id   set  ress  Ind  offset offset rage     Byte Bit Desc   EvlRepr
    --- ---- ---- ----- ---  ------ ------ -------- ---- --- ------ ------------------
     1  1025  20         2     0    Offset Nullable   0   1  AscKey DBC INTEGER
     2  1026  -          4    196   Comprs alc+Null1  0   2  NonKey DBC VARCHAR(10) LATIN
     3  1027  12         5     0    Var    Nullable   0   4  NonKey DBC VARCHAR(20) LATIN
     4  1028  -          3    206   Comprs alc+Null1  0   5  NonKey DBC VARCHAR(10) LATIN
     5  1029  14         0     0    Var    Nullable   0   7  NonKey DBC VARCHAR(20) LATIN

    To be as general as possible, each row structure diagram indicates the configuration of fields as they would be if the table had a partitioned primary index, though none of the actual table creation SQL text specifies a PPI. The diagrams also assume the system has a Packed64 row structure.

    Case 1

    Suppose you have the same table, but without any algorithmic compression being defined. The row structure for this table looks something like the following diagram.

    The following abbreviations are used for the various fields of the diagram for this case.

     

    Abbreviation

                                                                   Definition

    fc1

    Uncompressed data for column fc1, a fixed length data type.

    len1

    Length of the data in column alc1, an algorithmically‑compressed variable length data type.

    alc1

    Algorithmically‑compressed data for column alc1, a variable length data type.

    len2

    Length of the data in column alc2, an algorithmically‑ compressed variable length data type.

    alc2

    Algorithmically‑compressed data for column alc2, a variable length data type.

    vc1

    Uncompressed data for column vc1, a variable-length data type.

    vc2

    Uncompressed data for column vc2, a variable-length data type.

    In this case, the data values for columns alc1 and alc2 are both stored in the row and are not compressed.

    Teradata Database has placed the variable length compressible columns after the fixed length columns in the compressible columns area of the row. This is also reflected in the Field5 descriptor. Because the columns are not compressed, Teradata Database stores lengths with the two alc columns. len1 and len2 contain the uncompressed lengths of those columns, and the ALC bit is not set.

    Case 2

    In the next case, algorithmic compression has been defined, and the data for columns alc1 and alc2 is not null. alc1 and alc2 are present in the row and compressed. len1 and len2 contain the new compressed length. The ALC bit has been set for this case to indicate that the data in columns alc1 and alc2 is compressed.

    For this case, the row structure for the table superficially looks exactly like the row structure diagram for the previous case.

    The following abbreviations are used for the various fields of the diagram for this case.

     

    Abbreviation

                                                                   Definition

    fc1

    Uncompressed data for column fc1, a fixed length data type.

    len1

    Length of the data in column alc1, an algorithmically‑compressed variable length data type.

    alc1

    Algorithmically‑compressed data for column alc1, a variable length data type.

    len2

    Length of the data in column alc2, an algorithmically‑compressed variable length data type.

    alc2

    Algorithmically‑compressed data for column alc2, a variable length data type.

    vc1

    Uncompressed data for column vc1, a variable-length data type.

    vc2

    Uncompressed data for column vc2, a variable-length data type.

    Case 3

    In the next case, algorithmic compression has been defined, but the data for columns alc1 and alc2 is null. alc1 and alc2 are present in the row and compressed. len1 and len2 contain the new compressed length. The ALC bit is not set for this case because column alc1 is null.

    The following abbreviations are used for the various fields of the diagram for this case.

     

    Abbreviation

                                                                   Definition

    fc1

    Uncompressed data for column fc1, a fixed-length data type.

    len1

    Not represented.

    This column is null, so no length for column alc2 is stored in the row.

    alc1

    Not represented.

    This column is null, so no value for alc1 is stored in the row.

    len2

    Length of the data in column alc2, an algorithmically‑compressed variable length data type.

    alc2

    Algorithmically‑compressed data for column alc2, a variable length data type.

    vc1

    Uncompressed data for column vc1, a variable-length data type.

    vc2

    Uncompressed data for column vc2, a variable-length data type.

    Example 2: Multi-Value Compression But No Algorithmic Compression

    This example presents row structure cases based on the following table definition. This table has multi-value compression defined on columns mvc1 and mvc2, but no algorithmic compression. Particularly relevant data for each case is highlighted in red boldface type.

         CREATE TABLE t1 (
           fc1  INTEGER,
           mvc1 VARCHAR(10) COMPRESS (‘mars’,’saturn’,’jupiter’),
           vc1  VARCHAR(20),
           mvc2 VARCHAR(10) COMPRESS (‘Germany’,’France’,’England’),
           vc2  VARCHAR(20) );

    Field 5 of the table header for this table has the following field descriptors.

                             UDT or
                  Num-  Next Com-                   Pres-
    In- Fld  Off- Comp- Fld  press  Calc   Sto-     ence     Sort
    dex id   set  ress  Ind  offset offset rage     Byte Bit Desc   EvlRepr
    --- ---- ---- ----- ---  ------ ------ -------- ---- --- ------ ------------------
     1  1025  20         2     0    Offset Nullable   0   1  AscKey DBC INTEGER
     2  1026  -          4    196   Comprs cmp+Null2  0   2  NonKey DBC VARCHAR(10) LATIN
     3  1027  14         5     0    Var    Nullable   0   5  NonKey DBC VARCHAR(20) LATIN
     4  1028  -          3    206   Comprs cmp+Null1  0   6  NonKey DBC VARCHAR(10) LATIN
     5  1029  16         0     0    Var    Nullable   0   1  NonKey DBC VARCHAR(20) LATIN

    Case 1

    In this case, the data values for columns mvc1 and mvc2 are both stored in the row because they are not compressed for the particular data values they contain.

    Teradata Database has placed the variable length compressible columns after the fixed length columns in the compressible columns area of the row. This is also reflected in the Field5 descriptor. len1 and len2 contain the uncompressed lengths of those columns.

    The following abbreviations are used for the various fields of the diagram for this case.

     

    Abbreviation

                                                                   Definition

    fc1

    Uncompressed data for column fc1, a fixed-length data type.

    len1

    Compressed length of the data in column mvc1, a multi-value compressed variable length column.

    mvc1

    Multi-value compressed data for column mvc1.

    len2

    Compressed length of the data in column mvc2, a multi-value‑compressed variable length column.

    mvc2

    Multi-value compressed data for column mvc2.

    vc1

    Uncompressed data for column vc1, a variable-length data type.

    vc2

    Uncompressed data for column vc2, a variable-length data type.

    Case 2

    The next case demonstrates row storage when the column data in the row is to be multi-value compressed. In this case, the value for column mvc1 is not stored in the row because it is compressed. The value is instead stored in the table header and referenced by the presence bits array for the row.

    The following abbreviations are used for the various fields of the diagram for this case.

     

    Abbreviation

                                                                   Definition

    fc1

    Uncompressed data for column fc1, a fixed-length data type.

    len1

    Not represented.

    This column is null, so no length for column mvc1 is stored in the row.

    mvc1

    Not represented.

    This column is null, so no value for mvc1 is stored in the row.

    len2

    Compressed length of the data in column mvc2, a multi-value‑compressed variable-length column.

    mvc2

    Multi-value compressed data for column mvc2.

    vc1

    Uncompressed data for column vc1, a variable-length data type.

    vc2

    Uncompressed data for column vc2, an uncompressed variable-length data type.

    Case 3

    The next case demonstrates row storage when the column data for mvc1 is null. In this case, the value for column mvc1 is not stored in the row because nulls are always compressed. This demonstrates that nulls never incur a cost overhead for compression.

    The following abbreviations are used for the various fields of the diagram for this case.

     

    Abbreviation

                                                                   Definition

    fc1

    Uncompressed data for column fc1, a fixed-length data type.

    len1

    Not represented.

    This column is null, so no value for mvc1 is stored in the row.

    mvc1

    Not represented.

    This column is null, so no value for mvc1 is stored in the row.

    len2

    Compressed length of the data in column mvc2, a multi-value compressed variable-length column.

    mvc2

    Multi-value compressed data for column mvc2.

    vc1

    Uncompressed data for column vc1, a variable-length data type.

    vc2

    Uncompressed data for column vc2, a variable-length data type.

    Example 3: Mix of Multi-Value and Algorithmic Compression

    This example presents row structure cases based on the following table definition. This table has multi-value compression defined on columns mvc_f1 and mvc_v1, and algorithmic compression defined on column alc1. Particularly relevant data for each case is highlighted in red boldface type.

         CREATE TABLE t1(
           fc1    INTEGER,
           alc1   VARCHAR(10) COMPRESS ALGCOMPRESS huffcomp
                                       ALGDECOMPRESS huffdecomp,
           vc1    VARCHAR(20),
           mvc_f1 CHARACTER(10) COMPRESS (‘Germany’,’France’,’England’),
           mvc_v1 VARCHAR(20) COMPRESS (‘Nike’,’Reebok’,’Adidas’) );

    Field 5 of the table header for this table has the following field descriptors.

                             UDT or
                  Num-  Next Com-                   Pres-
    In- Fld  Off- Comp- Fld  press  Calc   Sto-     ence     Sort
    dex id   set  ress  Ind  offset offset rage     Byte Bit Desc   EvlRepr
    --- ---- ---- ----- ---  ------ ------ -------- ---- --- ------ ------------------
     1  1025  20         2     0    Offset Nullable   0   1  AscKey DBC INTEGER
     2  1026  -          4    196   Comprs alc+Null1  0   2  NonKey DBC VARCHAR(10) LATIN
     3  1027  14         0     0    Var    Nullable   0   5  NonKey DBC VARCHAR(20) LATIN
     4  1028  -          5    226   Comprs cmp+Null2  0   6  NonKey DBC VARCHAR(10) LATIN
     5  1029  -          3    256   Comprs cmp+Null2  1   1  NonKey DBC VARCHAR(20) LATIN

    Case 1

    In this case, no compression has been specified for the particular data values, so data for columns alc1, mvc_f1, and mvc_v1 is stored in the row.

    The following abbreviations are used for the various fields of the diagram for this case.

     

    Abbreviation

                                                                 Definition

    fc1

    Uncompressed data for column fc1, a fixed length data type.

    len1

    Compressed length of the data in column alc1, an algorithmically‑compressed variable length data type.

    alc1

    Algorithmically‑compressed data for the variable length column alc1.

    mvc_f1

    Multi-value compressed data for the fixed length column mvc_f1.

    len2

    Length of the data in column mvc_f1, a multi-value compressed variable length data type.

    mvc_v1

    Multi-value compressed data for the variable length column mvc_v1.

    vc1

    Uncompressed data for column vc1, a variable length data type.

    Teradata Database has placed the variable compressible columns after the fixed field columns in the compressible columns area. This is also reflected in the Field5 descriptor. The compressed column data is interleaved with other compressible fields. len1 and len2 contain the uncompressed lengths of alc1 and mvc_v1, respectively.

    Case 2

    In this case, Teradata Database compresses the data for columns mvc_v1 and alc1 because the values for the row are specified in the multi-value for those columns.

    The following abbreviations are used for the various fields of the diagram for this case.

     

    Abbreviation

                                                                 Definition

    fc1

    Uncompressed data for column fc1, a fixed length data type.

    len1

    Compressed length of the data in column alc1, an algorithmically‑compressed variable length data type.

    alc1

    Algorithmically‑compressed data for column alc1, a variable length data type.

    mvc_f1

    Multi-value compressed data for column mvc_f1, a fixed length data type.

    mvc_v1

    Not represented.

    This column is multi‑value compressed, so no value for mvc1 is stored in the row.

    vc1

    Uncompressed data for column vc1, a variable length data type.

    In this case, Teradata Database does not store the value for mvc_v1 in the row because it is compressed for the particular data value, so the value is stored in the table header. len1 contains the compressed length for alc1.

    Case 3

    In this case, the data for the compressible columns mvc_v1 and alc1 is null, so Teradata Database does not store values for those columns in the row. This example demonstrates that nulls never incur a cost overhead for compression.

    The following abbreviations are used for the various fields of the diagram for this case.

     

    Abbreviation

                                                                 Definition

    fc1

    Uncompressed data for column fc1, a fixed length data type.

    len1

    Not represented.

    This column is null, so no value for alc1 is stored in the row.

    alc1

    Not represented.

    This column is null, so no value for alc1 is stored in the row.

    mvc_f1

    Multi-value compressed data for column mvc_f1, a fixed length data type.

    len2

    Not represented.

    This column is null, so no value for mvc_v1 is stored in the row.

    mvc_v1

    Not represented.

    This column is null, so no value for mvc_v1 is stored in the row.

    vc1

    Uncompressed data for column vc1, a variable length data type.