TD_Histogram Function | Histogram | Teradata Vantage - TD_Histogram - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-04-06
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantageā„¢
TD_Histogram calculates the frequency distribution of a dataset using one of the following methods:
  • Sturges
  • Scott
  • Variable-width
  • Equal-width

A histogram is a graphical representation of the distribution of numerical data. It consists of a series of rectangles, or bars, which represent the frequency of occurrences of data values within certain intervals, or "bins." The x-axis of a histogram represents the range of values in the data set being analyzed, while the y-axis represents the frequency of occurrence of values within each bin. Each bar in the histogram represents a bin, and the height of the bar corresponds to the number of data points in that bin.

Several types of histograms can be used, depending on the characteristics of the data being analyzed:
  • Continuous Histograms: Use for data that is continuous, such as height or weight. The bins are typically defined as intervals of equal width, and the area of each bar represents the proportion of data points that fall within that interval.
  • Discrete Histograms: Use for data that is discrete, such as the number of cars in a parking lot. The bins are typically defined as integer values, and the height of each bar represents the frequency of occurrence of each integer value.
  • Frequency Polygon: Replace the bars by points and the lines are drawn connecting these points. This type of histogram is useful for comparing multiple datasets on the same graph.
  • Cumulative Histogram: Show the cumulative frequency of the data as the bars are stacked on top of each other. This type of histogram is useful for identifying percentiles and cumulative proportions.
  • 2D Histogram: Represent the joint distribution of two variables. The x-axis and y-axis represent the two variables, and the height of each bar represents the frequency of occurrence of each combination of values.
  • Kernel Density Estimation Histogram: Smooth version of a histogram that estimates the probability density function of the data. This type of histogram is useful for identifying the shape of the distribution and identifying any peaks or outliers.
A histogram offers several advantages for analyzing and visualizing data:
  • Provides a clear visual representation: Histograms are an effective way to represent large sets of data in a clear and concise manner. They allow you to quickly visualize the distribution of the data and identify patterns or outliers.
  • Easy to understand: Histograms are relatively simple to understand and do not require advanced statistical knowledge. They are a useful tool for communicating data to a wide range of audiences, including those without a background in statistics.
  • Allows for easy comparisons: Histograms make it easy to compare data sets and identify differences or similarities between them. Multiple histograms can be plotted on the same graph, making it easy to compare the distribution of different variables.
  • Provides information on central tendency and variability: The shape of the histogram can indicate whether the data is skewed or symmetrical, while the spread of the data can be estimated from the range of the bins.
  • Identifies outliers: Histograms can be used to identify outliers or unusual data points that fall outside the typical range of values. This can be particularly useful for detecting errors or anomalies in a dataset.