Adaptive Histogram - Example #1

Teradata Warehouse Miner User Guide - Volume 1Introduction and Profiling

brand
Software
prodname
Teradata Warehouse Miner
vrm_release
5.4.4
category
User Guide
featnum
B035-2300-077K
  1. Parameterize an Adaptive Histogram analysis as follows:
    • Columns to Analyze — twm_customer.income
    • Spike Threshold — 10
    • Subdivision Threshold — 30
    • Subdivision Method — Means
    • Number of Bins — 10
  2. Run the analysis.
  3. When it completes, click in the RESULTS tab.

    For this example, the Adaptive Histogram analysis generated the following results. Note that the SQL is not shown for brevity.

    Adaptive Histogram Analysis Example #1 Data
    xtbl xcol xbeg xend xtype xdesc xcnt xpct
    TWM_CUSTOMER income 0 0 0 spike 102 13.6546185
    TWM_CUSTOMER income 1039 15350.8 1 bin 243 32.5301205
    TWM_CUSTOMER income 1039 2470.18 2 --bin 10 1.3386881
    TWM_CUSTOMER income 2470.18 3901.36 2 --bin 23 3.0789826
    TWM_CUSTOMER income 3901.36 4956.7726871 2 --bin 15 2.0080321
    TWM_CUSTOMER income 4956.7726871 5696.7572443 3 **bin 4 .5354752
    TWM_CUSTOMER income 5696.7572443 6436.7418016 3 **bin 22 2.9451138
    TWM_CUSTOMER income 6436.7418016 7176.7263588 3 **bin 13 1.7402945
    TWM_CUSTOMER income 7176.7263588 7916.710916 3 **bin 18 2.4096386
    TWM_CUSTOMER income 7916.710916 8656.6954733 3 **bin 18 2.4096386
    TWM_CUSTOMER income 8656.6954733 9396.6800305 3 **bin 16 2.1419009
    TWM_CUSTOMER income 9396.6800305 10136.6645877 3 **bin 13 1.7402945
    TWM_CUSTOMER income 10136.6645877 10876.6491449 3 **bin 10 1.3386881
    TWM_CUSTOMER income 10876.6491449 11616.6337022 3 **bin 12 1.6064257
    TWM_CUSTOMER income 11616.6337022 12356.6182594 3 **bin 17 2.2757697
    TWM_CUSTOMER income 12356.6182594 12488.44 2 --bin 6 .8032129
    TWM_CUSTOMER income 12488.44 13919.62 2 --bin 30 4.0160643
    TWM_CUSTOMER income 13919.62 15350.8 2 --bin 16 2.1419009
    TWM_CUSTOMER income 15350.8 29662.6 1 bin 194 25.9705489
    TWM_CUSTOMER income 29662.6 43974.4 1 bin 104 13.9223561
    TWM_CUSTOMER income 43974.4 58286.2 1 bin 54 7.2289157
    TWM_CUSTOMER income 58286.2 72598 1 bin 18 2.4096386
    TWM_CUSTOMER income 72598 86909.8 1 bin 19 2.5435074
    TWM_CUSTOMER income 86909.8 101221.6 1 bin 7 .9370817
    TWM_CUSTOMER income 101221.6 115533.4 1 bin 2 .2677376
    TWM_CUSTOMER income 115533.4 129845.2 1 bin 2 .2677376
    TWM_CUSTOMER income 129845.2 144157 1 bin 2 .2677376

    By default, the Adaptive Histogram Graph page should display a two-dimensional graph showing the distribution of the “income” column, as shown below.

    Adaptive Histogram Analysis Example #1 Graph

    This two-dimensional view shows the distribution of the “income” column (lower graph) as well as the range of values within each bin (upper graph). Also on the upper graph is an indicator that signals either a data spike (red triangle) or a bin that has been subdivided (purple range of values). Note that the value of “0” is a spike, defined as having 10% or more occurrences overall.

    The second income bin, in the range of 1039-15350.8, has more than 30% of the values and has therefore been subdivided. This subdivision can be displayed on a separate graph by left mouse click followed by selection of “sub-bins” on the blue distribution bar within the subdivided range of 1039-15350.8:

    Adaptive Histogram Analysis Example #1 Graph: Subdivision

    This histogram shows the distribution of data within the bin range 1039-15350.8, along with a range of values for each subdivision.

  4. Click on << back to Histogram to go back to the original distribution and re-enable the Graph Options tab.