Adaptive Histogram | Vantage Analytics Library - Adaptive Histogram

Adaptive Histogram | Vantage Analytics Library - Adaptive Histogram - Vantage Analytics Library

Vantage Analytics Library User Guide

Deployment

VantageCloud

VantageCore

Edition

Enterprise

IntelliFlex

Lake

VMware

Product

Vantage Analytics Library

Release Number

2.2.0

Published

March 2023

Language

English (United States)

Last Update

2024-01-02

dita:mapPath

ibw1595473364329.ditamap

dita:ditavalPath

iup1603985291876.ditaval

dita:id

zyl1473786378775

Product Category

Teradata Vantage

Adaptive Histogram analysis (also called adaptive binning) supplements Histogram analysis by further subdividing the distribution.

You can apply Adaptive Histogram analysis to columns of any numeric type, including date types.

Adaptive Histogram analysis determines the frequency percentage above which to treat a value as a spike and the percentage above which a bin is overpopulated.

A spike is a variable value at which a disproportionately large (user-defined) number of rows occurs, while an overpopulated bin is a range of variable values that contains a disproportionately large (user-defined) number of rows.

Adaptive Histogram analysis modifies the computed equal-sized bins to include a separate bin for each spike and to further subdivide an overpopulated bin, returning counts and boundaries for each resulting bin. The following occurs during the subdivision of an overpopulated bin:

The subdivision is performed by first dividing by the same number of bins and then merging this with a subdivision in the region of the mean value within the bin.
Subdivision near the mean is done by subdividing by the same number of bins the region around the mean, -/+ the standard deviation (if outside of the original bin then from the bin boundary).
Subdividing may optionally be done using quantiles, giving approximately equally distributed bins.

Adaptive Histogram analysis is useful for an initial investigation of the distribution of a column or columns in a table to decide what analysis to perform next. Without Adaptive Histogram analysis, spike values and overpopulated bins can distort the bin counts. However, unlike Histogram analysis, Adaptive Histogram analysis does not offer binning by width, quantile, boundary, or over multiple dimensions, and does not allow use of overlay or statistics on other columns.

To further reduce the range of bins or the number of rows to bin, use a WHERE clause.