The important thing to understand when considering the comparative a priori likelihoods of the accuracy of statistics collected by the various available methods is the consistently higher accuracy of population statistics over all forms of sampled statistics. It is always possible that statistics collected using a method with a lesser probability of accuracy will be as good as those collected at any given higher level of probable accuracy, but they will never be more accurate
Although this is undeniable, it is not possible to know a priori whether it is necessary to collect a full set of new statistics in order to ensure that the Optimizer produces the best query plans.
The best comparative estimates of the relative accuracies of the statistics collected by the various methods are described by the following ranked list.
1 Dynamic AMP samples are better than residual statistics in the majority of cases.
Dynamic AMP samples are also recollected each time a DBD is retrieved from disk, so they are nearly always more current than residual statistics.
2 Dynamic all‑AMPs samples are better than dynamic AMP samples in most cases.
3 Full‑table population statistics are usually better than any form of sampled statistics.
The following table provides some details to support these rankings. Each successively higher rank represents an increase in the accuracy of the statistics collected and a higher likelihood that the Optimizer will produce a better query plan because it has more accurate information to use for its estimates.
Collection Method |
Relative Elapsed Time to Collect |
Accuracy Rank (Higher Number = Higher Accuracy) |
Comments
|
None. Use residual statistics |
None. |
1 |
At worst, can produce a very poor query plan. At best, can produce as good a query plan as freshly collected statistics. |
Dynamic AMP sample |
Almost none. |
2 |
Collecting data from a subset of the rows on a single AMP is the default method. Depending on an internal DBS Control variable, the default number of AMPs sampled ranges over 1, 2, 5, all AMPs on a node, or all AMPs on a system. Consult your Teradata support representative for details. Other standard statistical parameters are less likely to be as accurate. |
All‑AMP sample |
Approximately 5% of time to perform a full‑table scan. When the data is skewed, this percentage is larger, depending on how much the system dynamically increases its sample size |
3 |
|
Full‑table scan |
Approximately 195% of the time to perform sampled statistics. |
4 |
|