Adaptive Histogram | Vantage Analytics Library - Adaptive Histogram - Vantage Analytics Library

Vantage Analytics Library User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
Lake
VMware
Product
Vantage Analytics Library
Release Number
2.2.0
Published
March 2023
Language
English (United States)
Last Update
2024-01-02
dita:mapPath
ibw1595473364329.ditamap
dita:ditavalPath
iup1603985291876.ditaval
dita:id
zyl1473786378775
Product Category
Teradata Vantage

Adaptive Histogram analysis (also called adaptive binning) supplements Histogram analysis by further subdividing the distribution.

You can apply Adaptive Histogram analysis to columns of any numeric type, including date types.

Adaptive Histogram analysis determines the frequency percentage above which to treat a value as a spike and the percentage above which a bin is overpopulated.

A spike is a variable value at which a disproportionately large (user-defined) number of rows occurs, while an overpopulated bin is a range of variable values that contains a disproportionately large (user-defined) number of rows.

Adaptive Histogram analysis modifies the computed equal-sized bins to include a separate bin for each spike and to further subdivide an overpopulated bin, returning counts and boundaries for each resulting bin. The following occurs during the subdivision of an overpopulated bin:
  • The subdivision is performed by first dividing by the same number of bins and then merging this with a subdivision in the region of the mean value within the bin.
  • Subdivision near the mean is done by subdividing by the same number of bins the region around the mean, -/+ the standard deviation (if outside of the original bin then from the bin boundary).
  • Subdividing may optionally be done using quantiles, giving approximately equally distributed bins.

Adaptive Histogram analysis is useful for an initial investigation of the distribution of a column or columns in a table to decide what analysis to perform next. Without Adaptive Histogram analysis, spike values and overpopulated bins can distort the bin counts. However, unlike Histogram analysis, Adaptive Histogram analysis does not offer binning by width, quantile, boundary, or over multiple dimensions, and does not allow use of overlay or statistics on other columns.

To further reduce the range of bins or the number of rows to bin, use a WHERE clause.