Determining the Reliability of Statistics From History Intervals| Vantage - Determining the Reliability of Statistics From History Intervals - Advanced SQL Engine - Teradata Database

SQL Request and Transaction Processing

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
17.00
Published
June 2020
Language
English (United States)
Last Update
2021-01-24
dita:mapPath
ykx1561500561173.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1142
lifecycle
previous
Product Category
Teradata Vantageā„¢

Vantage performs linear trend analyses (linear regression) of historical demographics to determine the reliability factor for downgrading to sampled statistics, applying threshold techniques, stale statistics extrapolations. The reliability factor is derived from a combination of various stability and consistency factors.

Vantage retains history records and analyzes changes in the trends of the cardinalities for collected statistics. The Optimizer attempts to find a linear relationship between each statistic and its cardinality trends. If it finds a reliable linear relationship, it models the relationship as a linear function. Vantage uses the linear functions it develops to estimate the cardinality for a current statistic at a given time. If the Optimizer does not find a reliable trend, it uses heuristics to estimate the cardinality of a statistic. When the historical change trends for statistics do not match their recent changes, Vantage purges the past statistics automatically.

This approach assumes that the current and future changes of statistics will be similar to any recent changes. For situations when that assumption is clearly not valid, such as when newly added or updated data is significantly different from recent past insertions deletions, or updates, you should recollect statistics.

Using a Trend Line in Estimating Statistics Values

The history of a statistic is a set of history records. Vantage maintains the following types of statistics history records:
  • Those based on timestamps and cardinalities
  • Those based on statistics values and cardinalities

Among the historical statistics values maintained are the number of unique values, the number of nulls, the high modal frequency, the minimum value, and the maximum value.

The criteria Vantage uses to select a single best trend line from the available candidate trend lines are as follows:
  • The trend line must be derived from the most recent history records
  • The trend line must be stable
  • The trend line must be consistent

The reliability of an estimated statistical value is determined by a weighting function. A reliability weighting function with values that range between 0 and 1 represents how reliable a statistic computed using a linear regression method is.

Vantage computes this reliability based on information it derives from stability and consistency estimates made for the most recent trend. Given a stability threshold, a trend analysis locates all of the recent trends that satisfy a given stability threshold.

Based on a rule that states that a trend line is stable if its stability value between the bounds of 0 and 1 is greater than some specified threshold, Vantage selects the trend that has the largest number of history records.

The following graph shows a trend analysis where the trend line indicates that there should be a minimum of 10 history intervals that satisfy the stability threshold to reach a reliability value greater than 0.9.



Estimating Statistics Values Using Linear Regression

When history data is available, Vantage extrapolates or interpolates statistics values using linear regression.

For example, Vantage uses linear regression to estimate various measures of a value and then combines them to estimate a statistical value.