Comparing the Number of I/O Operations Required to Answer the Same SELECT Request - Teradata Database

Teradata Database Design

Product
Teradata Database
Release Number
15.10
Language
English (United States)
Last Update
2018-10-06
Product Category
Software

Comparing the Number of I/O Operations Required to Answer the Same SELECT Request

The purpose of this topic is to compare the number of I/O operations required to perform the identical query against a nonpartitioned primary‑indexed table, a row-partitioned table, a single‑level column‑partitioned table, and a multilevel column-partitioned table.

Each of the tables examined contains the same 4 million rows of data and each is probed using the identical SELECT request. The only difference is the configuration of the table containing the 4 million rows.

The documented I/O figures are only approximations to demonstrate the relative differences in I/O operations for the same SELECT request executed on the same table, the only difference among the table versions being whether they are indexed or not and the type of partitioning used, if any.

The first table, io_pi, has a nonpartitioned primary index and has the following definition.

     CREATE TABLE io_pi (
       a INTEGER,
       b INTEGER,
       c CHARACTER(100),
       d INTEGER,
       e INTEGER,
       f INTEGER,
       g INTEGER,
       h INTEGER,
       i INTEGER,
       j INTEGER,
       k INTEGER,
       l INTEGER)
     PRIMARY INDEX (a)

You can see the general organization of the table from a few sampled rows.

 

 a

 b

 c

 d

 e

 f

 g

 h

 i

 j

 k

 l

1

5

a

3

9

9

4

6

2

7

4

5

2

9

q

5

4

6

3

8

5

1

1

2

3

1

d

1

1

3

3

4

7

8

2

9

4

8

m

7

3

9

4

1

4

2

8

6

5

3

f

2

2

4

7

3

1

5

7

2

6

6

r

1

8

2

8

3

4

2

5

1

7

2

e

0

5

1

6

4

3

9

9

7

8

4

u

9

0

1

2

7

6

6

0

3

9

2

d

3

7

5

1

2

6

3

3

8

Now submit the following SQL request, which reads all 4 million rows in io_pi.

     SELECT SUM(f)
     FROM io_pi
     WHERE b BETWEEN 4 AND 7;

To read the required rows, Teradata Database used 9,987 I/O operations.

The next table is a row-partitioned primary-indexed table with the following definition.

     CREATE TABLE io_ppi (
       a INTEGER,
       b INTEGER,
       c CHARACTER(100),
       d INTEGER,
       e INTEGER,
       f INTEGER,
       g INTEGER,
       h INTEGER,
       i INTEGER,
       j INTEGER,
       k INTEGER,
       l INTEGER)
     PRIMARY INDEX (a)
     PARTITION BY RANGE_N(b BETWEEN 1
                            AND     9
                            EACH    1));

After loading io_ppi with the same 4 million rows, you submit the same request against io_ppi that you had earlier submitted against io_pi.

Now submit the following SQL request, which reads only 4 row partitions in io_pi.

     SELECT SUM(f)
     FROM io_ppi
     WHERE b BETWEEN 4 AND 7;

Teradata Database must only read 4 row partitions, but still took 4,529 I/O operations to read those four partitions.

The next table is a single‑level column‑partitioned NoPI table with the following definition.

     CREATE TABLE io_cp_col_nopi (
       a INTEGER,
       b INTEGER,
       c CHARACTER(100),
       d INTEGER,
       e INTEGER,
       f INTEGER,
       g INTEGER,
       h INTEGER,
       i INTEGER,
       j INTEGER,
       k INTEGER,
       l INTEGER)
     PARTITION BY COLUMN;

Now submit the following request, which reads only 2 column partitions in io_cp_nopi.

     SELECT SUM(f)
     FROM io_cp_col_nopi
     WHERE b BETWEEN 4 AND 7;

Teradata Database must read only 2 column partitions, but still took 281 I/O operations to read those 2 partitions.

The next table is a multilevel column‑partitioned NoPI table with the following definition.

     CREATE TABLE io_cp_rowcol_nopi (
       a INTEGER,
       b INTEGER,
       c CHARACTER(100),
       d INTEGER,
       e INTEGER,
       f INTEGER,
       g INTEGER,
       h INTEGER,
       i INTEGER,
       j INTEGER,
       k INTEGER,
       l INTEGER)
     PARTITION BY (COLUMN,
                   RANGE_N(b BETWEEN 1
                             AND     9
                             EACH    1));

Now submit the following SQL request, which reads only 4 row partitions of 2 column partitions in io_cp_rowcol_nopi.

     SELECT SUM(f)
     FROM io_cp_rowcol_nopi
     WHERE b BETWEEN 4 AND 7;

Teradata Database only needed to perform 171 I/O operations to return the result set for this request.

Summarizing this example set, each table in the set contains 4 million rows, and each must solve the identical SELECT request. The following table summarizes the number of I/O operations each table required to return the requested answer set.

 

                                       Partitioning Used

Number of I/O Operations Required

None.

Table has a nonpartitioned primary index.

PRIMARY INDEX

          9,987

Table has a single‑level row-partitioned primary index.

PRIMARY INDEX
PARTITION BY RANGE_N(b BETWEEN 1
                       AND     9
                       EACH    1)

          4,529

Table has no primary index and single‑level column partitioning.

PARTITION BY COLUMN

           281

Table has no primary index and multilevel column and row partitioning.

PARTITION BY (COLUMN,
              RANGE_N(b BETWEEN 1
                        AND     9
                        EACH    1)

           171

The conclusion to draw from the data in these examples is that column and row partition elimination is an effective performance optimization technique, and the more partitions that can be eliminated when responding to a request, the better.

For this particular query workload, the multilevel column‑partitioned and row-partitioned table requires the fewest I/O operations. You should not interpret this to mean that multilevel column partitioning is a universally optimal performance enhancement strategy.