In OTF, statistics on a dataset of an OTF column are stored in a Metadata JSON file for query optimization.
However, this standard approach has key limitations compared to Teradata’s statistics mechanisms:
- No Histogram Support: Only scalar stats (for example, null count, distinct values) are collected, whereas Teradata supports histograms.
- No Expression-Based Stats: Stats can be collected on columns or column combinations, but not on expressions (for example, (temperature-32)/1.8), which Teradata supports.
- Limited Stat Types: The data sketch only provides unique value counts. Teradata collects additional stats like high mode frequency, partial null counts, and unique values on partial-null entries.
This feature builds on Teradata’s existing infrastructure, aligning with industry standards. The key focus areas include:
- Teradata Syntax Support: Commands like COLLECT STATS, DROP STATS, HELP STATS, and SHOW STATS now support OTF stats. However, using the AS clause to name stats entries is not allowed.
- Histogram Collection: Supports column-based histograms and stats like high mode frequency, similar to Teradata’s native capabilities. Expression-based stats are not supported due to naming and API limitations.
- Optimizer Behavior: Optimizer first checks StatsTbl, then falls back to Metadata JSON - allowing use of vendor-generated OTF stats.