Comparative Accuracy of Sampled and Population Statistics
The important thing to understand when considering the comparative likelihoods of the accuracy of statistics collected by the available methods is the consistently higher accuracy of population statistics over all forms of sampled statistics. Statistics collected using a method with a lesser probability of accuracy may be as good as those collected at any higher level of probable accuracy, but are never more accurate.
You cannot know whether you need to collect a full set of new statistics to make sure that the Optimizer produces the best query plans.
Ranking Relative Accuracies of Methods
- Dynamic AMP samples are better than residual statistics in the majority of cases.
Dynamic AMP samples are also recollected each time a DBD is retrieved from disk, and therefore are typically more current than residual statistics.
- Dynamic all-AMPs samples are better than dynamic AMP samples in most cases.
- Full-table population statistics are typically better than sampled statistics.
The following table provides details to support these rankings. Each successively higher rank represents an increase in the accuracy of the statistics collected and a higher likelihood that the Optimizer produces a better query plan.
Collection Method | Relative Elapsed Time to Collect | Accuracy Rank (Higher Number = Higher Accuracy) | Comments |
---|---|---|---|
None. Use residual statistics |
None. | 1 |
|
Dynamic AMP sample | Almost none. | 2 |
|
All-AMP sample | Approximately 5% of time to perform a full-table scan. When the data is skewed, this percentage is larger, depending on how much the system dynamically increases its sample size |
3 |
|
Full-table scan | Approximately 195% of the time to perform sampled statistics. | 4 |
|