There was a time not so very long ago when backups were a minor issue in database management. You set up a batch job to perform the backup overnight, while the system was not being used by anybody else, and that was that. Several issues have complicated that once common scenario.
First is the tremendous increase in the size of databases that can be supported by a relational database management system like Teradata. Where at one time a “very large database” might be on the order of several GB, it is increasingly common to see multiterabyte databases supporting large data warehouses and support for petabyte and even yottabyte databases is on the near horizon. Magnetic tape is a serial medium. Even if you archive to multiple tape drives simultaneously, there is still a considerable time issue involved with backing up multiterabyte databases.
Second is the end of the era of the batch window. Enterprise data warehouses now typically support worldwide operations, and that means that the system must be available to users 24 hours per day, seven days a week. Without the old batch window, how do you back up your business-critical data without having a significant negative effect on the capability of your data warehouse to support its worldwide user base?
Many approaches to solving these problems fall outside the scope of database design, but at least one method for minimizing the impact of backing up very large databases can be designed into your database. That method is the subject of this topic.
For purposes of this topic, large and small tables and databases are defined as follows:
|Less than 100 GB.|
|Greater than 100 GB.|
The issue described in this topic and its suggested solution apply to the large system environment, primarily when using the NetVault product to backup to and recovery from different machines. The REEL product makes it possible to restore to or recover from the same machine using its fast path option, but even that option does not relieve the problem of backing up to a second machine.
Database and table dumps and restores are a time consuming process. To minimize this problem, you can isolate large tables in their own databases in order to isolate their backup and restoration. Otherwise, when tables of this magnitude are mixed with much smaller tables, you have two choices:
- Back up the large table in one pass and then backup the remaining tables in a second pass.
- Back up the entire database at once, which takes a very long time to complete (and even longer to restore)
Design your database in such a way that large tables are isolated within their own, separate databases, then build views for your users to use to access the data. Carefully constructed views can mask this separation and permit users to access data across multiple databases transparently within a single query. The following graphic illustrates one possible design that uses this method.