Using Join Index Row Compression | VantageCloud Lake - Using Join Index Row Compression - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Row compression divides rows into repeating and nonrepeating portions. The system appends multiple sets of nonrepeating column values to a single set of repeating column values. Thus the system stores the repeating value set only once, storing any nonrepeating column values as logical extensions of the base repeating set. The nonrepeating column set has a pointer to each repeating column set so the rows can be reconstructed when necessary.

Row compression significantly reduces index storage space for index columns that have a large number of repeating values. You can do the following:

  • Specify the columns to be compressed using the (column_1_name) syntax elements in the SELECT clause of a CREATE JOIN INDEX statement.
  • Specify the columns to remain uncompressed using the (column_2_name) syntax elements in the SELECT clause for the same CREATE JOIN INDEX statement.

Row compression is valuable where column values repeat from row to row. Columns in a join index that have a high percentage of distinct (non-repeating) values are not a good candidates for row compression. Row compression cannot be used with data using row-level security constraints. A join index with row compression cannot be column-partitioned.

Typically, a join index inherits any row compression you define for the index and the MVC compression defined for its base tables.

Example: Join Index Row Compression

Assume that a join index specifies 5 columns:
  • Two columns with repeating values:
    • Column A indicates whether the customer owns a truck or car
    • Column B indicates whether the vehicle is domestic or foreign
  • Three columns of unique or infrequent information:
    • Column C is the customer name
    • Column D is the customer address
    • Column E is the customer phone number

The uncompressed join index row data is as follows:

(car,foreign,joe,addr1,ph1)
(car,foreign,jane,addr2,ph2)
(car,foreign,max,addr3,ph3)
(car,domestic,bill,addr6,ph6)
(car,domestic,linda,addr7,ph7)
(truck,domestic,brad,addr4,ph4)
(truck,domestic,sam,addr5,ph5)
(truck,foreign,dave,addr8,ph8)
(truck,foreign,rick,addr9,ph9)

You can specify row compression in a CREATE JOIN INDEX statement, for example:

CREATE JOIN INDEX  ji_name  AS
     SELECT (col_a,col_b),(col c,col_d,col_e)
     FROM  table_name 
     . . . ;

where the first set of parentheses defines the columns (col_a and col_b) for which duplicate row values are compressed, and the second set of parentheses defines the remaining columns (col c, col_d, and col_e), for which row values are not compressed.

Therefore, the system stores each possible combination of the repeating values for col_a and col_b once, grouped with the related row values for the uncompressed columns, for example:

((car,foreign),(joe,addr1,ph1),(jane,addr2,ph2),(max,addr3,ph3))
((car,domestic),(bill,addr6,ph6),(linda,addr7,ph7))
((truck,domestic),(brad,addr4,ph4),(sam,addr5,ph5))
((truck,foreign),(dave,addr8,ph8),(rick,addr9,ph9))