16.10 - Column-Partitioned NoPI Tables and Join Indexes - Teradata Database

Teradata Database Design

Product
Teradata Database
Release Number
16.10
Release Date
June 2017
Content Type
User Guide
Publication ID
B035-1094-161K
Language
English (United States)

This topic describes the structure of column-partitioned NoPI tables and column-partitioned NoPI join indexes, comparing them with nonpartitioned NoPI tables.

When autocompression methods are described, they are described only to indicate how their use affects the content and size of the row header. Because Teradata Database determines whether to use autocompression and which autocompression methods to use for a container, they are not documented any further in the Teradata Database documentation library.

Options for Storing a NoPI Table

The following graphic shows four options for storing a NoPI table in Teradata Database, where the darkened row for each table represents a table header, one on each AMP, for a physical table. The table header defines the layout and other information about the physical table.

Moving from left to right, the structures represent the layouts of four different methods of NoPI storage.

  • Traditional Teradata Database row storage without partitioning
  • A column-partitioned table with single-column containers (COLUMN format), one for each column
  • A column-partitioned table with single-column containers for columns A and B; multicolumn subrows (ROW format) for columns C and D
  • A column-partitioned table with single-column containers for columns A and B; multicolumn containers for columns C and D


The topics that follow compare various configurations of nonpartitioned NoPI and column-partitioned NoPI tables.

About Column-Partitioned NoPI Tables

If a request requires multiple columns to return the requested result set, the Optimizer query plan includes assembling projected column values from selected table rows together to form the result rows. This can be combined with row partition elimination to further reduce the data that must be accessed to satisfy a query.

Column partitioning has various system-applied compression techniques available to it that can be employed automatically to reduce the storage requirements for a table. These compression methodologies can reduce the I/O required to process queries and, when combined with row partitioning, the number of compression opportunities might increase.

The following CREATE TABLE request creates a table definition with four column partitions, two of which are internal partitions that Teradata Database creates for all column-partitioned tables.

The theoretical maximum number of column partitions for this table is 65,534, including the two column partitions for internal use, and the maximum column partition number is 65,535. The difference is caused by there always being at least one unused column partition number available for altering a column partition.

However, this table can actually have no more than 2,050 column partitions because the total number of columns for a table cannot exceed 2,048. To have 2,050 column partitions, each user-defined column partition could have only one column.

     CREATE TABLE sales_1 (
      (storeid        INTEGER NOT NULL,       /* Partition 1 */
       productid      INTEGER NOT NULL,
       salesdate      DATE FORMAT 'yyyy-mm-dd' NOT NULL,
       totalrevenue   DECIMAL(13,2)),
       (totalsold     INTEGER,                /* Partition 2 */
       topsalesperson INTEGER,
       note           VARCHAR(256)))
     NO PRIMARY INDEX     -- This clause is optional.
     PARTITION BY COLUMN; -- Defines 2 multicolumn partitions plus 2
                          -- additional column partitions for internal
                          -- use.

The following example creates a table definition for a column-partitioned table with row partitioning.

The COLUMN specification in the PARTITION BY clause defines 7 column partitions: 5 column partitions based on the column grouping in the column definition list plus 2 additional column partitions for internal use.

The maximum number of column partitions is 779 including the 2 column partitions for internal use. This means that it is possible to add 770 column partitions to this table. The maximum column partition number is 780.

The RANGE_N function in the partitioning expression defines 48 row partitions with the option to add up to 36 row partitions for a maximum of 84 row partitions. The maximum row partition number is 84.

The maximum combined partition number for this table is 65,520, which is computed from the following product: 780*84. This is the product of the maximum partition number of each partitioning level. The table has 2-byte partitioning because the number of combined partitions is not greater than 65,535.

If you were to create this table using multicolumn partitions, you could add columns up to the maximum of 2,048. You could also specify the ADD option to specify a higher number of column partitions that could be added, but that would result in the table using 8-byte partitioning rather than 2-byte partitioning.

CREATE TABLE sales_2 (
       store_id        INTEGER NOT NULL,
       product_id      INTEGER NOT NULL,
       sales_date      DATE FORMAT 'yyyy-mm-dd' NOT NULL,
       total_revenue   DECIMAL(13,2),
      (total_sold      INTEGER,
       top_salesperson INTEGER,
       note            VARCHAR(256)))
     NO PRIMARY INDEX,      -- This clause is optional.
     PARTITION BY (COLUMN,  -- Defines 4 single-column partitions and
                            -- 1 multicolumn partition plus 2
                            -- additional column partitions for internal
                            -- use.
                   RANGE_N(salesdate BETWEEN DATE '2007-01-01'
                                     AND     DATE '2010-12-31'
                                     EACH INTERVAL '1' MONTH)
                   ADD 36);

You can use column partitioning to improve query performance through column partition elimination.

You can use row partitioning to improve query performance through row partition elimination, which reduces the need to access all of the rows in a table. Teradata Database uses efficient methods to reconstruct qualifying rows from the projected columns of the column partitions that are not eliminated.

Advantages of column partitioning include a simple CREATE TABLE syntax with default autocompression, the ability of the Optimizer to perform column partition elimination, and to facilitate efficient access data from column partitions. Teradata Database manages the column partitions so that the table appears to you as a single object.

Example: Nonpartitioned NoPI Table

The following SQL text is the table definition for a nonpartitioned NoPI table. Note that this table does not have partitioning.

     CREATE TABLE orders (
        order_num   INTEGER NOT NULL,
        order_date  DATE NOT NULL,
        item_num    INTEGER COMPRESS NULL,
        item_desc   CHARACTER(30) COMPRESS NULL)
     NO PRIMARY INDEX;

Because NoPI tables are always multiset tables, Teradata Database does not do duplicate row checks. Teradata Database randomly distributes rows or blocks of rows to the AMPs or locally copies them, as is the case for an INSERT … SELECT request, instead of hash distributing them it does for a table with a primary index or a primary AMP index. In the case of a NoPI table, Teradata Database appends the rows for an AMP to the portion of that table owned by that AMP.

The table header and user data would look something like this.

Table Header                       Row ID  Presence  Bits                                            Orders
  Partition   Number     Hash    Bucket Uniqueness   OrderNum   OrderDate   ItemNum:N  ItemDesc:N
                 
User Data Rows 0 n 1 1,1 100 07-29-2009 756 Hammer
0 n 2 1,1 100 07-29-2009 124 Screwdriver
0 n 3 1,1 100 07-29-2009 437 1-Inch Nails
0 n 4 1,1 290 08-04-2009 110 Shovel
0 n 5 0,1 290 08-04-2009 null Rack
0 n 6 1,1 450 09-01-2009 815 Light Bulb
0 n 7 1,1 501 09-15-2009 437 1-Inch Nails
0 n 8 0,1 530 09-15-2009 null Drill
0 n 9 1,1 625 10-03-2009 815 Light Bulb
0 n 10 1,0 719 10-07-2009 756 null
0 n 11 1,1 768 11-06-2009 756 Hammer

where:

Table element … Specifies …
Partition Number the internal partition number indicating the column partition and row partitions for this row.

Because there is no partitioning for this example, the partition number for every row in the table is 0.

Bits in the flag byte of a physical row indicate no partitioning so a 2-byte or 8-byte partition number of 0 is implied, but does not actually exist in a physical row of a NoPI table.

Hash Bucket a 20-bit (or 16-bit) hash bucket value.
n the lowest numbered hash bucket for this AMP from the NoPI hash map.

This value increases to the next hash bucket from the NoPI hash map for this AMP if the uniqueness overflows.

Teradata Database does not actually compute n by hashing the values of any columns, but whatever AMP the row is randomly sent initially uses its lowest numbered hash bucket from the NoPI hash map.

Uniqueness a uniqueness value.
  • The value is 48 bits for a 16-bit hash bucket.
  • The value is 44 bits for a 20-bit hash bucket.

If the uniqueness value overflows, it resets to 1.

N a nullable column with COMPRESS NULL.
Presence Bits null compression with a presence indicator bit.

If the presence bit is 1, a value is present; otherwise, no value is present. This means the field is null, and the column takes no space because COMPRESS NULL is specified.

No presence bits exist if the column is defined as NOT NULL.

In the example, only 2 presence bits are shown per row because only 2 columns in the table are nullable.

Example: Column-Partitioned NoPI Table With No Autocompression

The following SQL text is the table definition for the orders column-partitioned NoPI table with no autocompression.

     CREATE TABLE orders (
       (order_num   INTEGER NOT NULL),
       (order_date  DATE NOT NULL),
       (item_num    INTEGER COMPRESS NULL),
       (item_desc   CHARACTER(30) COMPRESS NULL CHARACTER SET LATIN)
                    )
     NO PRIMARY INDEX
     PARTITION BY COLUMN NO AUTO COMPRESS;

NO AUTO COMPRESS can be omitted if the default settings are changed to NO AUTO COMPRESS. Because a column-partitioned table is also a multiset table, Teradata Database does not do any duplicate row checks. Column-partitioned NoPI tables are also similar to nonpartitioned NoPI tables in that Teradata Database randomly distributes rows or blocks of rows to the AMPs or are copied locally as is done for an INSERT … SELECT request instead of being hash-distributed as is done for a table with a primary index or a primary AMP index. Teradata Database appends each column partition value of a row for an AMP to the column partition or to the combined partition if multilevel partitioning is being used on that AMP.

This example examines a column-partitioned NoPI table with four column partitions (plus two for internal use) with one column per user-specified partition. The user-specified column partitions are not autocompressed and they all have COLUMN format.

COLUMN format is a series of consecutive column partition values stored in a container, where the rowID of the container is the standard rowID consisting of an internal partition number, a hash bucket number, and a uniqueness value associated with the first column partition value in the container. The internal partition number in this case represents the column and row partition numbers, if any.

The rowIDs associated with subsequent column partition values are implied by their position in the container. The rowIDs associated with the column partition values in the container, which are explicit for the first and implied for the remainder, have the same internal partition number and hash bucket, and the uniqueness increases by one for each column partition value represented by the container.

The file system stores a container based on the rowID in its row header as is currently done for a physical row representing a row in a table that is not column-partitioned.

A column partition has as many containers as needed to hold all of the column partition values of the table columns included in that partition. Teradata Database stores the column partition values in the inserted row order within a container, and the containers are in row-insertion order within a combined partition. To simplify the example, only two containers are shown for each of the column partitions, and each container only represents a small number of column-partition values.

Containers for different column partitions can have a different number of values represented per container. In this example, the first container for item_desc represents 5 column values, while the first container for each of the other columns represent 6 column values. The second container for item_desc starts at uniqueness value 6, while the second container for each of the other columns starts at uniqueness value 7.

Because no rows are deleted, the internal delete column partition is empty. Note that the first uniqueness value for a container plus the number of column partition values represented by that container does not equal the first uniqueness value of the next container if the intervening containers are deleted or if the internal partition number or hash bucket changes for the next container.

A null column in the table takes no place in the column value list.

The internal delete column partition, partition number 65,535 is empty in this case because the table is column-partitioned.

The table header and user data for this example would look something like this.

Table Header Row ID Presence Bits Orders
Partition Number Hash Bucket Unique-ness OrderNum:1,NAC OrderDate:2,NAC ItemNum:3, NAC,N ItemDesc:4,NAC,N
  Column Values  
OrderNum 1 n 1   100   100 100 290 290 450  
  1 n 7   501   530 625 719 768    
OrderDate 2 n 1   07-29-2009   07-29-2009 07-29-2009 08-04-2009 08-04-2009 09-01-2009  
  2 n 7   09-15-2009   09-15-2009 10-03-2009 10-07-2009 11-06-2009    
ItemNum 3 n 1 1,1,1,1,0,1 756   124 437 110 815    
  3 n 7 1,0,1,1,1 437   815 756 756      
ItemDesc 4 n 1 1,1,1,1,1 Hammer Screw-driver 1-Inch Nails Shovel Rack    
  4 n 6 1,1,1,1,0, 1 Light Bulb   1-Inch Nails Drill Light Bulb Hammer    

where:

Table element … Specifies …
Partition Number a 2-byte or 8-byte internal partition number indicating the column partition and row partitions for a container.

Bits in the flag byte of a physical row indicate the partitioning, which in this case is 2-byte partitioning, and a non-0 2-byte part of the internal partition number exists in a physical row of this column-partitioned table.

Hash Bucket a 20-bit (or 16-bit) hash bucket value.
n the lowest numbered hash bucket for this AMP from the NoPI hash map.

This value increases to the next hash bucket for this AMP if the uniqueness overflows.

Teradata Database does not compute n by hashing the values of any columns, but whatever AMP the row is randomly sent initially uses its lowest numbered hash bucket from the NoPI hash map.

Uniqueness a uniqueness value.
  • The value is 48 bits for a 16-bit hash bucket.
  • The value is 44 bits for a 20-bit hash bucket.

If the uniqueness value overflows, it resets to 1.

NAC no autocompression.
N a nullable column with COMPRESS NULL.
Presence Bits null compression with a presence indicator bit.

If the presence bit is 1, a value is present; otherwise, no value is present. This means the field is null, and the column takes no space because COMPRESS NULL is specified.

No presence bits exist if the column is defined as NOT NULL.

In the example, only 2 presence bits are shown per row because only 2 columns in the table, item_num and item_desc, are nullable.

Example: Column-Partitioned NoPI Table With Autocompression

The following SQL text is the table definition for a column-partitioned NoPI table with autocompression.

     CREATE TABLE orders (
       order_num   INTEGER NOT NULL,
       order_date  DATE NOT NULL,
       item_num    INTEGER,
       item_desc   CHARACTER(30) CHARACTER SET LATIN)
     NO PRIMARY INDEX
     PARTITION BY COLUMN AUTO COMPRESS;

AUTO COMPRESS can be omitted as long as the default settings have not been changed to NO AUTO COMPRESS. With the exception of the column partitions being autocompressed in this example, it is identical to the example used in Example: Column-Partitioned NoPI Table With No Autocompression.

A column field in the table takes no place in the column value list.

Values in italic typeface are in the local value list dictionary for the container.

The internal delete column partition, partition number 65,535 in this case because the table is only column-partitioned, is empty.

Table Header Row ID ACT Autocompression Bits Orders
Partition Number Hash Bucket Unique-ness OrderNum:1,NAC OrderDate:2,NAC ItemNum:3, NAC,N ItemDesc:4,NAC,N
  Column Values  
OrderNum 1 n 1 R2

TT01 2,0

1111,1010,0110 100 290 450    
  1 n 7 TT01 2,0 10,10,10,10,10 501 530 625 719 768
OrderDate 2 n 1 R2 11,10,01 07-29-2009 08-04-2009 09-01-2009    
  2 n 7 PV 3,0,1 001,001,010,011,100 10-03-2009 10-07-2009 11-06-2009 09-15- 2009  
ItemNum 3 n 1 P 1,1,1,1,0,1 756 124 437 110 815
  3 n 7 PV 3,0,1 010,000,011,001,100 437 815 756    
ItemDesc 4 n 1 TL01 4,’ ’ 0110,1011,1100,

0110,0100

Hammer Screwdriver 1-Inch Nails Shovel Rack
  4 n 6 PV 3,0,1

TL01 4,’ ’

0011010,010,1100,

0110101,0011010,

0000000,1000110

1-Inch Nails Drill Hammer Light Bulb  

where:

Table element … Specifies …
Partition Number a 2-byte or 8-byte internal partition number indicating the column partition and row partitions for a container.

Bits in the flag byte of a physical row indicate the partitioning, which in this case is 2-byte partitioning, and a non-0 2-byte part of the internal partition number exists in a physical row of this column-partitioned table.

Hash Bucket a 20-bit (or 16-bit) hash bucket value.
n the lowest numbered hash bucket for this AMP from the NoPI hash map.

This value increases to the next hash bucket for this AMP if the uniqueness overflows.

Teradata Database does not compute n by hashing the values of any columns, but whatever AMP the row is randomly sent initially uses its lowest numbered hash bucket from the NoPI hash map.

Uniqueness a uniqueness value.
  • The value is 48 bits for a 16-bit hash bucket.
  • The value is 44 bits for a 20-bit hash bucket.

If the uniqueness value overflows, it resets to 1.

AC autocompression for containers of the column
N a nullable column.
ACT autocompression type.

The internal delete column partition, column partition 65,535, is empty because the table is column-partitioned.

The following bullets explain the autocompression used for each container in this example.

  • First OrderNum container.

    Teradata Database applied 2 autocompression techniques to this container.

    • The first two bits of each set of 4 autocompression bits are for R 2 (run length) autocompression, where the unsigned value of the two bits is the run length.
    • The second two bits are for TT01 2,0 (trim high-order bytes that are zero) autocompression.

      The number of sets of bits is the same as the number of values in this container since nulls do not need to be represented by this container.

      This container represents 6 values, with the uniqueness for rowIDs starting at 1 and continuing up to 6. Because the first value, 100, has a run length of 3, the second value, 290, has a run length of 2, and the last value, 450, has a run length of 1.

      The TT01 bits for the first value, 100, means that the value 100 is stored in the container as a single byte, and the 3 high-order 0 bytes are stripped.

      The TT01 bits for the second value, 290, means that the value is stored as two bytes, and the 2 high-order 0 bytes are stripped.

      The third value, 450, means the value is stored as 2 bytes, and the 2 high-order bytes are stripped, even though the data type is INTEGER which normally requires 4 bytes.

  • Second OrderNum container.

    Teradata Database applied only TT01 2,0 autocompression to this container because it has no runs.

    There is one set of bits for each value, but in this case, there are only 2 bits per value because run length bits are not included. The container represents a sequence of 5 values, with the uniqueness value for rowIDs starting at 7 as indicated by the rowID of the container and continuing up to 11.

    Each of the 5 values is stored as 2 bytes and the 2 high-order 0 bytes are stripped from each value.

  • First OrderDate container.

    Teradata Database applied R 2 (run length) autocompression to this container.

    The number of sets of bits is the same as the number of values in this container because it does not need to represent nulls.

    The container represents 6 values, with the uniqueness value for rowIDs starting at 1 and continuing up to 6. Because the first value, 07-29-2009, has a run length of 3, the second value, 08-04-2009, has a run length of 2, and the last value, 09-01-2009, has a run length of 1.

  • Second OrderDate container.

    Teradata Database applied PV 3,0,1 (single null and local value list) autocompression to this container.

    In this case, there are 3 bits for each value represented, but only 3 values and 1 local value list dictionary value are stored in the container.

    The container represents a sequence of 5 values, with the uniqueness value for rowIDs starting at 7 and continuing up to 11. The autocompression bits for the first 2 values indicate that their values are not present and to use the first entry in the local value list dictionary as their values.

    The autocompression bits for the next 3 values indicate the value is present, not compressed, and an index to the value in the list of values.

  • First ItemNum container.

    Teradata Database applied P (single null) autocompression to this container.

    There is one bit for each of the 6 values represented, but only 5 values in the container because ItemNum is null for the fifth row of the table and a value is not present.

    Note that a null takes no space even though COMPRESS NULL is not specified because autocompression is applied. It is also possible that space for the column partition value could be in the container even if COMPRESS is specified because the system determines the compression used when autocompression is in effect.

  • Second ItemNum container.

    Teradata Database applied PV 3,0,1 (single null and local value list) autocompression to this container.

    In this case, there are 3 bits for each of the 5 values represented but only 2 values and 1 local value list dictionary value are stored in the container.

    This container represents a sequence of 5 values (with the uniqueness for rowIDs starting at 7 and continuing to 11). The autocompression bits for the first and third values indicate that the values are present (437 and 815, respectively), not compressed, and that there is an index to the value in the list of values.

    The bits for the second value indicate that it is null and not present. The bits for the fourth and fifth values indicate that their values are not present and that Teradata Database should use the first entry in the local value list dictionary as their values.

  • First ItemDesc container.

    Teradata Database applied TL01 4,' ' (trim trailing single-byte space for fixed-length data type) autocompression to this container.

    In this case, there are 4 bits for each of the 5 values represented and 5 values (stripped of trailing spaces) in the container.

    The 4 bits for a value indicate its length after trimming trailing spaces. The uncompressed value can be obtained from the value in the container by adding trailing spaces as needed to a fixed length of 30. In this case, the first container represents only 5 values, but the total number of values represented by both containers of ItemDesc is 11, the same as the total for the other columns.

  • Second ItemDesc container.

    Teradata Database applied PV 3,0,1 (single null and local value list) and TL01 4,' ' (trim trailing single-byte space for fixed-length data type) autocompression to this container.

    In this case, there are 7 bits for each of the 6 values represented, but only 3 values and 1 local value list dictionary value are stored in the container.

    The first 3 bits indicate whether the value is null (00) and not present, is not present and to use the entry in the local value-list dictionary (01) as its value, or present (010, 011, 100, 110, 111 indicating an index in the list of values). The other 4 bits indicate the length of the value, if present, after trimming trailing spaces. The length is 0000 if the field is null, so it is ignored.

    This container represents a sequence of values, with the uniqueness for rowIDs starting at 6 as indicated by the rowID of the container and continuing to 11. The autocompression bits for the first and fourth values indicate that their values (Light Bulb) are not present and to use the entry in the local value list dictionary as their values.

    The bits for the second, third, and sixth values (1-inch Nails, Drill, and Hammer, respectively) indicate that the values are present, but stripped of trailing spaces. The bits for the fifth value indicate that it is null and not present.

    The stripped values can be extended with spaces to their fixed length of 30 when decompressing the values.

The types of autocompression that Teradata Database chooses to apply to the containers for a table are chosen on a container by container basis, as this example demonstrates. Teradata Database could just as easily apply other autocompression types, depending on the context for each container to which autocompression was applied. Optionally, some column partitions can be set for AUTO COMPRESS and some for NO AUTO COMPRESS by specifying AUTO COMPRESS or NO AUTO COMPRESS as appropriate for each column partition to override the default.