Histogram - Example #2 - Teradata Warehouse Miner - 5.4.4

Teradata Warehouse Miner User Guide - Volume 1Introduction and Profiling

prodname
Teradata Warehouse Miner
vrm_release
5.4.4
category
User Guide
featnum
B035-2300-077K
  1. Parameterize a Histogram analysis as follows:
    • Histogram Style — Basic
    • Columns to Analyze
      • twm_customer.age
      • twm_customer.income
    • Overlay Columns — gender
    • Statistics Columns — nbr_children
  2. Run the analysis in the same manner as described above.

    This time, the following Results should be generated. Again, the SQL is not shown.

    Histogram Analysis Example #2 Table
    xtbl xcol xbin xbeg xend xcnt xpct ovly_gen xocnt xopbct xopct
    twm_c... age 1 13 20.6 140 18.7416332 F 78 55.7142857 10.4417671
    twm_c... age 1 13 20.6 140 18.7416332 M 62 44.2857143 8.2998661
    twm_c... age 2 20.6 28.2 56 7.4966533 F 33 58.9285714 4.4176707
    twm_c... age 2 20.6 28.2 56 7.4966533 M 23 41.0714286 3.0789826
    twm_c... age 3 28.2 35.8 92 12.3159304 F 49 53.2608696 6.5595716
    twm_c... age 3 28.2 35.8 92 12.3159304 M 43 46.7391304 5.7563588
    twm_c... age 4 35.8 43.4 107 14.3239625 F 63 58.8785047 8.4337349
    twm_c... age 4 35.8 43.4 107 14.3239625 M 44 41.1214953 5.8902276
    twm_c... age 5 43.4 51 88 11.7804552 F 52 59.0909091 6.961178
    twm_c... age 5 43.4 51 88 11.7804552 M 36 40.9090909 4.8192771
    twm_c... age 6 51 58.6 110 14.7255689 F 58 52.7272727 7.7643909
    twm_c... age 6 51 58.6 110 14.7255689 M 52 47.2727273 6.961178
    twm_c... age 7 58.6 66.2 71 9.5046854 F 41 57.7464789 5.4886212
    twm_c... age 7 58.6 66.2 71 9.5046854 M 30 42.2535211 4.0160643
    twm_c... age 8 66.2 73.8 35 4.6854083 F 17 48.5714286 2.2757697
    twm_c... age 8 66.2 73.8 35 4.6854083 M 18 51.4285714 2.4096386
    twm_c... age 9 73.8 81.4 28 3.7483266 F 18 64.2857143 2.4096386
    twm_c... age 9 73.8 81.4 28 3.7483266 M 10 35.7142857 1.3386881
    twm_c... age 10 81.4 89 20 2.6773762 F 9 45 1.2048193
    twm_c... age 10 81.4 89 20 2.6773762 M 11 55 1.4725569
    twm_c... income 1 0 14415.7 332 44.4444444 F 200 60.2409639 26.7737617
    twm_c... income 1 0 14415.7 332 44.4444444 M 132 39.7590361 17.6706827
    twm_c... income 2 14415.7 28831.4 191 25.5689424 F 117 61.2565445 15.6626506
    twm_c... income 2 14415.7 28831.4 191 25.5689424 M 74 38.7434555 9.9062918
    twm_c... income 3 28831.4 43247.1 108 14.4578313 F 50 46.2962963 6.6934404
    twm_c... income 3 28831.4 43247.1 108 14.4578313 M 58 53.7037037 7.7643909
    twm_c... income 4 43247.1 57662.8 63 8.4337349 F 30 47.6190476 4.0160643
    twm_c... income 4 43247.1 57662.8 63 8.4337349 M 33 52.3809524 4.4176707
    twm_c... income 5 57662.8 72078.5 20 2.6773762 F 12 60 1.6064257
    twm_c... income 5 57662.8 72078.5 20 2.6773762 M 8 40 1.0709505
    twm_c... income 6 72078.5 86494.2 19 2.5435074 F 6 31.5789474 .8032129
    twm_c... income 6 72078.5 86494.2 19 2.5435074 M 13 68.4210526 1.7402945
    twm_c... income 7 86494.2 100909.9 7 .9370817 F 1 14.2857143 .1338688
    twm_c... income 7 86494.2 100909.9 7 .9370817 M 6 85.7142857 .8032129
    twm_c... income 8 100909.9 115325.6 3 .4016064 F 2 66.6666667 .2677376
    twm_c... income 8 100909.9 115325.6 3 .4016064 M 1 33.3333333 .1338688
    twm_c... income 9 115325.6 129741.3 2 .2677376 M 2 100 .2677376
    twm_c... income 10 129741.3 144157 2 .2677376z M 2 100 .2677376
    Histogram Analysis Example #2 Data
    xtbl xmin_nbr... xman_nbr... xmean_nbr... xstd_nbr...
    twm_c... 0 0 0 0
    twm_c... 0 0 0 0
    twm_c... 0 2 .7878788 .7690047
    twm_c... 0 2 .5217391 .650723
    twm_c... 0 3 1.6326531 1.1192905
    twm_c... 0 3 1.5813953 1.1858185
    twm_c... 0 5 1.3333333 1.5013222
    twm_c... 0 5 1.5227273 1.3566107
    twm_c... 0 5 .8653846 1.2093998
    twm_c... 0 5 1.2222222 1.7497795
    twm_c... 0 2 .9655172 .6939521
    twm_c... 0 2 .8461538 .7174907
    twm_c... 0 2 9.7560976E-02 .3698964
    twm_c... 0 2 .1333333 .4268749
    twm_c... 0 0 0 0
    twm_c... 0 0 0 0
    twm_c... 0 0 0 0
    twm_c... 0 0 0 0
    twm_c... 0 0 0 0
    twm_c... 0 0 0 0
    twm_c... 0 3 .315 .7181748
    twm_c... 0 3 .25 .6077155
    twm_c... 0 5 1 1.132277
    twm_c... 0 4 .7297297 1.0691914
    twm_c... 0 5 .82 1.1779643
    twm_c... 0 5 1.2413793 1.3684921
    twm_c... 0 5 1.4666667 1.2578642
    twm_c... 0 5 1.6666667 1.6080605
    twm_c... 0 4 1.75 1.4215602
    twm_c... 0 2 1 .8660254
    twm_c... 0 4 1 1.4142136
    twm_c... 0 3 .8461538 .9483714
    twm_c... 1 1 1 0
    twm_c... 0 2 .5 .7637626
    twm_c... 0 2 1 1
    twm_c... 0 0 0 0
    twm_c... 1 2 1.0 .5
    twm_c... 0 0 0 0

    By default, the same two-dimensional graph shown in Tutorial #1 appears.

  3. Go to the Graph Options tab.
  4. Select the Show Overlay Counts and 3D Graph radio buttons.

    This graph is shown below.

    Histogram Analysis Example #2 Graph: Three Dimensional View

    Note the same ranges for “age” column as before (“13” to “89”). Now, however, each bin has been overlaid with the distinct values of “gender” (“M” and “F”). The counts for each overlay are represented by height. Note that, for the first bin range of “age” (“18 to 20.6”), there are approximately 75 females (where “gender” = “F”) and 60 males (where “gender” = “M”).

    This image can be rotated either by double-clicking anywhere on the graph (automatic), or by the vertical and/or horizontal scroll-bars.

    When rotating, the number from 0-359 in the uppermost left-hand corner of the graph is the degrees of rotation about the z-axis. This value changes as the horizontal scroll-bar is adjusted.

  5. Click on the Graph Options tab as described above to change the data being graphed from “age” to “income.”
  6. Select the Show Bin Stats radio button.

    Note that this disables the Show Overlay Counts option as well as 3D Graph.

    This graph is shown here.

    Histogram Analysis Example #2 Graph

    Note that the first bin range of the “income” variable is 0-14415.7. This is broken down into the two pieces, one for each overlay value (“M” and “F”). So, within this first bin of income, we have 200 females (where “gender” = “F”) and 132 males (where “gender” = “M”). Further, the statistics for “nbr_children” are shown. Note that the minimum value of nbr_children is 0, the maximum is 3, the mean is .315 and the standard deviation is .7181748. This is illustrated graphically by the orange square (mean), the wide blue bar (+/- one standard deviation), and the upper and lower blue line (minimum and maximum). Note that, since minus one standard deviation encompasses the minimum value, no lower blue line is shown.