Comparative Accuracy of Sampled and Population Statistics
The important thing to understand when considering the comparative a priori likelihoods of the accuracy of statistics collected by the various available methods is the consistently higher accuracy of population statistics over all forms of sampled statistics. It is always possible that statistics collected using a method with a lesser probability of accuracy will be as good as those collected at any given higher level of probable accuracy, but they will never be more accurate.
Although this is undeniable, it is not possible to know a priori whether it is necessary to collect a full set of new statistics in order to ensure that the Optimizer produces the best query plans.
Ranking the Relative Accuracies of the Various Methods
The best comparative estimates of the relative accuracies of the statistics collected by the various methods are described by the following ranked list.
- Dynamic AMP samples are better than residual statistics in the majority of cases.
Dynamic AMP samples are also recollected each time a DBD is retrieved from disk, so they are nearly always more current than residual statistics.
- Dynamic all-AMPs samples are better than dynamic AMP samples in most cases.
- Full-table population statistics are usually better than any form of sampled statistics.
The following table provides some details to support these rankings. Each successively higher rank represents an increase in the accuracy of the statistics collected and a higher likelihood that the Optimizer will produce a better query plan because it has more accurate information to use for its estimates.
Collection Method | Relative Elapsed Time to Collect | Accuracy Rank (Higher Number = Higher Accuracy) | Comments |
---|---|---|---|
None. Use residual statistics |
None. | 1 |
|
Dynamic AMP sample | Almost none. | 2 |
|
All-AMP sample | Approximately 5% of time to perform a full-table scan. When the data is skewed, this percentage is larger, depending on how much the system dynamically increases its sample size |
3 |
|
Full-table scan | Approximately 195% of the time to perform sampled statistics. | 4 |
|