Use of Statistics for Optimizer - Teradata Vantage

Teradata® Open Table Format for Apache Iceberg and Delta Lake User Guide

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Lake
Product
Teradata Vantage
Release Number
20.00
Published
October 2025
ft:locale
en-US
ft:lastEdition
2025-10-25
dita:mapPath
qrj1749167830193.ditamap
dita:ditavalPath
lli1749584660955.ditaval
dita:id
bsr1702324250454

OTF statistics are used by the Optimizer for cardinality estimation during query planning. These stats can come from two sources:

  • DBC.StatsTbl
  • Metadata JSON file

If both the sources are available, the Optimizer prioritizes DBC.StatsTbl.

Following are the four primary usage scenarios:

  • Key Spool Cardinality Estimation (KSCE)
  • Single Table Cardinality Estimation (STCE)
  • Join Cardinality Estimation (JCE)
  • Aggregation Cardinality Estimation (ACE)

The cardinality estimation and confidence levels associated with STCE, JCE, and ACE on OTF tables follow established patterns, ensuring consistency with existing methodologies. For KSCE, the following mechanisms are used:

  Confidence Estimated Cardinality
RET from Manifest List NO confidence, regardless of whether statistics are collected or single-table conditions are specified, because there is no information provided about the number of manifest files available to the Optimizer MIN of
  • The sequence number
  • The number of data files
RET from Manifest Files No single-table conditions are specified on partitioning columns HIGH confidence The number of data files
Single-Table conditions on partitioning columns with OTF statistics LOW confidence The number of data files * data row selectivity
Single-Table conditions on partitioning columns without OTF statistics NO confidence The number of data files * 33%

Examples: STCE and ACE

  • STCE
    • Actual cardinality: 6
    • Without stats: RET step shows NO confidence, estimated rows = 3
    • With stats: RET step shows HIGH confidence, estimated rows = 6
  • ACE
    • Actual cardinality: 6
    • Without stats: SUM step shows NO confidence, estimated rows = 18
    • With stats: SUM step shows LOW confidence, estimated rows = 6

These examples highlight how OTF statistics improve estimation accuracy and confidence during query planning.

  With Statistics for OTF Tables Without Statistics for OTF Tables
STCE
explain
sel city from datalake_iceberg_glue.vim.t1_part
where city = 'portland';

Explanation
---------------------------------------------------------------------------
...
  4) We do an all-AMPs RETRIEVE step in TD_MAP1 from a single column
     partition of datalake_iceberg_glue .vim.t1_part
     by way of external metadata in Spool 3 (Last Use) with a condition
     of ("datalake_iceberg_glue .vim.t1_part.city = 'portland'") into
     Spool 1 (group_amps), which is built locally on the AMPs.  The
     size of Spool 1 is estimated with high confidence to be 6 rows (
     96,126 bytes).  The estimated time for this step is 0.02 seconds.
...
explain
sel city from datalake_iceberg_glue.vim.t1_part
where city = 'portland';

Explanation
---------------------------------------------------------------------------
...
  4) We do an all-AMPs RETRIEVE step in TD_MAP1 from a single column
     partition of datalake_iceberg_glue .vim.t1_part
     by way of external metadata in Spool 3 (Last Use) with a condition
     of ("datalake_iceberg_glue .vim.t1_part.city = 'portland'") into
     Spool 1 (group_amps), which is built locally on the AMPs.  The
     size of Spool 1 is estimated with no confidence to be 3 rows (
     48,063 bytes).  The estimated time for this step is 0.02 seconds.
...
ACE
explain
sel city, count(*)
from datalake_iceberg_glue.vim.t1_part
group by city;

Explanation
---------------------------------------------------------------------------
...
  5) We do an all-AMPs SUM step in TD_MAP1 to aggregate from Spool 2
     (Last Use) by way of an all-rows scan, grouping by field1 (
     datalake_iceberg_glue .vim.t1_part.city).  Aggregate intermediate
     results are computed globally, then placed in Spool 6 in TD_MAP1.
     The size of Spool 6 is estimated with low confidence to be 6 rows
     (128,142 bytes).  The estimated time for this step is 0.03 seconds.
...
explain
sel city, count(*)
from datalake_iceberg_glue.vim.t1_part
group by city;

Explanation
---------------------------------------------------------------------------
...
  5) We do an all-AMPs SUM step in TD_MAP1 to aggregate from Spool 2
     (Last Use) by way of an all-rows scan, grouping by field1 (
     datalake_iceberg_glue .vim.t1_part.city).  Aggregate intermediate
     results are computed globally, then placed in Spool 6 in TD_MAP1.
     The size of Spool 6 is estimated with no confidence to be 18 rows
     (384,426 bytes).  The estimated time for this step is 0.03 seconds.
...