Storing Data Efficiently | Database Design | Teradata Vantage - 17.10 - Storing Data Efficiently - Advanced SQL Engine - Teradata Database

Teradata Vantage™ - Database Design

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Release Date
July 2021
Content Type
User Guide
Publication ID
B035-1094-171K
Language
English (United States)

Specifying DECIMAL and NUMERIC Precisions

Storage requirements for decimal data types are a function of the precision required of the values to be stored. The following table indicates the number of bytes used to store DECIMAL and NUMERIC data types of various precisions.

Number of Precision Digits Number of Bytes Stored
1 - 2 1
3 - 4 2
5 - 9 4
10 - 18 8

You can achieve optimal space savings by using multivalue compression together with an efficient decimal size. An example would be to define decimal precision such that a 4-byte representation was used instead of an 8-byte representation, for example DECIMAL(9) instead of DECIMAL(10). Then, multivalue compression could be used on the most frequently occurring values to reduce the storage overhead still further.

Specifying INTEGER Precisions

Like the DECIMAL/NUMERIC data type, the INTEGER family of data types offers different levels of precision that you can harness to reduce the number of bytes used to store integer numbers.

The following table indicates the number of bytes used to store integer numbers of various precisions:

Integer Data Type Number of Bytes Stored
BYTEINT 1
SMALLINT 2
INTEGER 4
BIGINT 8

You can achieve optimal space savings by using multivalue compression together with an efficient integer data type. An example would be to define the precision such that a 1-byte representation was used instead of a 4-byte representation: for example, BYTEINT instead of INTEGER. Then, multivalue compression could be used on the most frequently occurring values to reduce the storage overhead still further.

Using Standardized Encodings

There are many standardized encodings of commonly used words. For example, the United States Postal Service (USPS) encodes all U.S. states and possessions using a two-character abbreviation. The lengths of the unabbreviated state and possession strings range from four characters (Guam, Iowa, Ohio, and Utah) to 30 characters (Federated States of Micronesia). While you could define a State column as CHARACTER(30) and compress its values, or as VARCHAR(30), the easiest and most compact method of denoting U.S. states and possessions is to use the two-character USPS encodings.

Storing NUMBER Data

You can achieve optimal space savings for NUMBER data using either multivalue compression, algorithmic compression, or a combination of both.

NUMBER data is stored as a variable length field from 0 to 18 bytes in width. You can use the NUMBER type to represent both fixed point and floating point decimal numbers, depending on the syntax you use to specify the type. In the following table, the characters p and s represent precision and scale, respectively.

General Type Syntax Functional Description
Fixed point NUMBER(p,s) Similar to DECIMAL(p,s)
NUMBER(p) Similar to DECIMAL(p)
Floating point NUMBER(*,s) Floating point decimal with s fractional digits
NUMBER A floating point number
NUMBER(*) Same as NUMBER

The following table highlights the approximate equivalences/valid substitutions between the fixed and floating point NUMBER types with other exact (fixed point) and approximate (floating point) types.

You can use this NUMBER type … Anywhere you can use this data type … With a maximum precision of this many digits …
Fixed point
  • DECIMAL
  • NUMERIC
38
Floating point
  • FLOAT
  • REAL
  • DOUBLE PRECISION
40

Whether a NUMBER type is used to represent a fixed point value or a floating point value, it is stored in the same way and with the same accuracy. See Floating Point NUMBER Types, Non-INTEGER Numeric Data Types, and Teradata Vantage™ - Data Types and Literals, B035-1143 for more information about the NUMBER type.

Storing VARCHAR, VARBYTE, and VARCHAR(n) CHARACTER SET GRAPHIC Data

You should consider storing VARCHAR, VARBYTE, and VARCHAR(n) CHARACTER SET GRAPHIC data as CLOB or BLOB, or data, respectively, if the columns in question are not accessed regularly. This consideration also applies to storing VARCHAR, VARBYTE, and VARCHAR(n) CHARACTER SET GRAPHIC data as XML data. This is because large objects are stored outside the base table row in subtables (subtable rows have the same row hash value as their associated base table rows, so both are stored on the same AMP), so there is no performance impact of non-LOB or non-XML queries against tables that have LOB or XML columns, while there is a performance cost to retrieving rows with inline VARCHAR, VARBYTE, and VARCHAR(n) CHARACTER SET GRAPHIC data if that data is not part of the request.

Conversely, character and byte columns that are frequently accessed should be stored as VARCHAR, VARBYTE, and VARCHAR(n) CHARACTER SET GRAPHIC if possible because of the performance cost of retrieving a LOB or XML string from its subtable.

Storing Character Data

For character data, an alternative to encodings and value compressing fixed-length CHARACTER(n) strings is to specify the variable- length VARCHAR or LONG VARCHAR data types. The number of bytes used to store each VARCHAR or LONG VARCHAR column is the length of the data item plus 2 bytes. Contrast this to the fixed-length CHARACTER data type which uses n bytes per row, regardless of the actual number of characters in each individual column.

The demographics of the data determine whether VARCHAR, LONG VARCHAR, or CHARACTER plus multivalue compression is more efficient. The most important factors are:
  • Maximum column length
  • Average column length
Evaluate the following factors when determining which approach to storing the data is the more efficient:
  • VARCHAR or LONG VARCHAR are more efficient when the difference of maximum and average column length is high and value compressibility is low.
  • Multivalue compression with CHARACTER data is more efficient when the difference of maximum and average column length is low and value compressibility is high.

When neither CHARACTER nor VARCHAR/LONG VARCHAR is clearly a superior choice, use VARCHAR or LONG VARCHAR because their data requires slightly less CPU resource to manipulate than CHARACTER data.