When there is no legacy database to build on, capacity planning can be a difficult enterprise to undertake. Fortunately, this is rarely an issue for contemporary users because at least part of their corporate databases are almost always maintained in electronic form. Keep in mind that much of the information presented in this chapter assumes you have a legacy system to draw upon for making your sizing estimations.
Capacity planning should begin with the idea of making the most frequently accessed data available at all times. With the relatively low priced, large capacity disk storage units commonly used for data warehousing applications, the nature of the emphasis on this factor has changed from offloading as much historical data as possible to archival storage toward developing the capability of keeping all data forever online and accessible to the warehouse.
In a data warehouse that maintains massive quantities of history data, the volume of data is typically inversely proportional to its use. In other words, there is an enormous amount of cool history data that is accessed lightly, and a relatively lesser volume of hot and warm data that is accessed frequently.
The following, somewhat loose, default definitions apply to the commonly described temperature bands.
|COLD||The 20% of data that is least frequently accessed.|
|WARM||The remaining 60% of data that falls between the COLD and HOT bands.|
|HOT||The 20% of data that is most frequently accessed.|
|VERY HOT||Data that you or Teradata Virtual Storage think should be added to the Very Hot cache list and have its temperature set to very hot when it is loaded using the TVSTemperature query band.|
The file system can change the compressed state of the data in an AUTOTEMP table at any time based on its temperature. Cylinders in an AUTOTEMP table become eligible for temperature-based block-level compression only when they reach or fall below the threshold defined for COLD temperature-based block level compression. See “TempBLCThresh” in Utilities for more information about the temperature settings that you can use for temperature-based block-level compression.
Temperature-based thresholds for the block-level compression of AUTOTEMP tables work as defined by the following table.
|IF data blocks are initially …||AND then become …||THEN the file system …|
|block-level compressed||warmer than the defined threshold for compression||decompresses them.|
|not block-level compressed||colder than the defined threshold for decompression||compresses them.|
For tables that are not defined with BLOCKCOMPRESSION=AUTOTEMP, you must control their block-level compression states yourself using Ferret commands or, if a table is not populated with rows, you can use one of the TVSTemperature query bands to specify the type of block-level compression to use for the newly loaded rows. If temperature-based block-level compression is disabled but block-level compression is enabled, Teradata Database treats AUTOTEMP tables the same as MANUAL tables.
For all of the data in a table to be block compressed or decompressed at once in an AUTOTEMP table, Teradata Virtual Storage must become aware that all cylinders in the table have reached the threshold specified by the DBS Control parameter TempBLCThresh. This would occur in the following case. Suppose the threshold value for TempBLCThresh is set to WARM.
|IF all of the cylinders in the table …||THEN they all become eligible for …|
|reach or fall below the WARM or COLD thresholds||block-level compression.|
|reach or exceed the HOT or VERY HOT thresholds||decompression.|
Because of this, the best practice is not to use the AUTOTEMP option, or not to use any form of temperature-based block-level compression for a table that you think requires compression consistency for the entire table.
Extended Data Lifetimes
The lifetime of data is now being extended for a number of different reasons. Users often have varied performance requirements for time-based or historical data: recent data might be accessed frequently, while older data is accessed less often. Call these conceptual access rates hot, warm, cool, and icy, respectively, ranking from most frequently accessed to least frequently accessed. Keep in mind that these data access states are largely conceptual. Cool and icy have a loose correspondence with the temperature-based block-level compression state of COLD and the warm and hot states loosely correspond with the identically named temperature-based block-level compression states.
In a warehouse with massive historical databases, the volume of data is typically inversely proportional to the data usage, as illustrated by the following graphic, where the ordinate represents the relative warmth of the data and the abscissa represents the volume of data represented by the respective measures of data warmth.
In this picture, the temperature of the data reflects its access rate. The optimal storage for hot, warm, and cool data is online disk that is directly accessible to the data warehouse.
See “CREATE TABLE” and “ALTER TABLE” in SQL Data Definition Language for information about how you can specify the way an individual table deals with block-level compression based on the temperature of the data.
Cool and Icy Data
Typically, there is a vast quantity of cool historical data in the data warehouse that is accessed lightly and a lesser volume of hot and warm data that is accessed frequently.
While cool data might be accessed lightly on average, it still has temporal hot spots, such as those that occur when performing comparative analyses of sales data between different time periods. Similarly, if the relational schema is modified by adding new columns or indexes, or if column data types are changed, then the affected data cannot be considered to be dormant. Finally, if data is periodically accessed and somehow recast to make historical data relevant within the current business context, then it is not dormant.
Truly dormant data is rarely, if ever, accessed, and is typically retained for various national and international regulatory reasons such as the Sarbanes-Oxley Act in the United States, Bill 198 in Canada, or the Eighth Company Law Directive 1984/253/EEC in the European Union rather than for operational reasons. As a result, the data stored in the warehouse is frequently a mix of both important and unimportant data (Unimportant from the perspective of the day-to-day operation of running the enterprise, not from a legal perspective), and a flexible management system is required to allocate the appropriate availability, reliability, and privacy levels for the data as its usage changes across time.
Icy, or truly dormant, data is a good candidate for alternative storage such as tape or optical disk. The treatment of icy data is not the subject of this chapter.
Warm, Hot, and Very Hot Data
Warm and hot data typically constitute what is often called the operational data store.
Hot and warm data are both important in the extended lifetime of data, particularly with respect to multivalue compression (see Compression Types Supported by Teradata Database), as is cool data to a lesser degree.