Equal Width versus Equal Height Histograms - Teradata Workload Analyzer

Teradata Workload Analyzer User Guide

Product
Teradata Workload Analyzer
Release Number
16.10
Published
May 2017
Language
English (United States)
Last Update
2018-10-13
dita:mapPath
awa1488824663126.ditamap
dita:ditavalPath
Audience_PDF_include.ditaval
dita:id
B035-2514
lifecycle
previous
Product Category
Teradata Tools and Utilities

A histogram is a vertical bar chart in which the frequency corresponding to a class is represented by the area of a bar (or rectangle) whose base is the class width. The histogram differs from a bar chart in that it is the area of the bar that denotes the value, not the height. However, if the widths of the bars are uniform (that is, equal-width) then only the height need be considered.

Teradata WA uses both equal-width and equal-height histograms in analyzing the what parameter.

Equal-Width Histograms

An equal-width histogram such as that shown below, divides data into a fixed number of equal-width ranges. The corresponding height of each range represents the number of values falling into that range.

Equal-Width Histogram

For example, suppose that the values in a single column of a 1000-row table range between 1 and 100, and you want to generate a 10-bucket equal-width histogram. (Ranges in histograms are often referred to as buckets.) The buckets would contain the values 1-10, 11-20, 21-30, and so on, where each bucket counts the number of rows falling into its range. For a list of supported analysis parameters, see Supported Analysis Parameters.

Equal-Height Histograms

Equal-width histograms work well when the variation of the data distribution is small. They do not work so well, however, when such variation is large. For example, in the figure above, 95% of the data falls into the first bucket, and the remaining 1% is scattered into 19 buckets, making it difficult to effectively analyze the data.

In this situation, an equal-height histogram is the solution, such as that shown in the figure below. Such histograms work well when the variation in data distribution is large. Unlike equal-width histograms, they place the same number of values into each range, so the endpoints of each range are determined by the number of values it contains.

Equal-Height Histogram

The graph isn't as informative as the bin-widths identified on its x-axis. They reveal that not only are the vast majority of data points in the 0–8.41 range, but provide additional insight that the vast majority of queries (80%) consume less than 0.01 CPU secs.

This information is provided in a pop-up dialog box for each equal-height histogram. For instructions on viewing histograms and their data, see Displaying a "What" Parameter Histogram.