Adaptive Histogram analysis (also called adaptive binning) supplements Histogram analysis by further subdividing the distribution.
You can apply Adaptive Histogram analysis to columns of any numeric type, including date types.
Adaptive Histogram analysis determines the frequency percentage above which to treat a value as a spike and the percentage above which a bin is overpopulated.
A spike is a variable value at which a disproportionately large (user-defined) number of rows occurs, while an overpopulated bin is a range of variable values that contains a disproportionately large (user-defined) number of rows.
- The subdivision is performed by first dividing by the same number of bins and then merging this with a subdivision in the region of the mean value within the bin.
- Subdivision near the mean is done by subdividing by the same number of bins the region around the mean, -/+ the standard deviation (if outside of the original bin then from the bin boundary).
- Subdividing may optionally be done using quantiles, giving approximately equally distributed bins.
Adaptive Histogram analysis is useful for an initial investigation of the distribution of a column or columns in a table to decide what analysis to perform next. Without Adaptive Histogram analysis, spike values and overpopulated bins can distort the bin counts. However, unlike Histogram analysis, Adaptive Histogram analysis does not offer binning by width, quantile, boundary, or over multiple dimensions, and does not allow use of overlay or statistics on other columns.
To further reduce the range of bins or the number of rows to bin, use a WHERE clause.