TD_Histogram Function | Histogram | Teradata Vantage - TD_Histogram - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905
TD_Histogram calculates the frequency distribution of a dataset using one of the following methods:
  • Sturges
  • Scott
  • Variable-width
  • Equal-width

A histogram is a graphical representation of the distribution of numerical data. It consists of a series of rectangles, or bars, which represent the frequency of occurrences of data values within certain intervals, or "bins." The x-axis of a histogram represents the range of values in the data set being analyzed, while the y-axis represents the frequency of occurrence of values within each bin. Each bar in the histogram represents a bin, and the height of the bar corresponds to the number of data points in that bin.

Several types of histograms can be used, depending on the characteristics of the data being analyzed:
  • Continuous Histograms: Use for data that is continuous, such as height or weight. The bins are typically defined as intervals of equal width, and the area of each bar represents the proportion of data points that fall within that interval.
  • Discrete Histograms: Use for data that is discrete, such as the number of cars in a parking lot. The bins are typically defined as integer values, and the height of each bar represents the frequency of occurrence of each integer value.
  • Frequency Polygon: Replace the bars by points and the lines are drawn connecting these points. This type of histogram is useful for comparing multiple datasets on the same graph.
  • Cumulative Histogram: Show the cumulative frequency of the data as the bars are stacked on top of each other. This type of histogram is useful for identifying percentiles and cumulative proportions.
  • 2D Histogram: Represent the joint distribution of two variables. The x-axis and y-axis represent the two variables, and the height of each bar represents the frequency of occurrence of each combination of values.
  • Kernel Density Estimation Histogram: Smooth version of a histogram that estimates the probability density function of the data. This type of histogram is useful for identifying the shape of the distribution and identifying any peaks or outliers.
A histogram offers several advantages for analyzing and visualizing data:
  • Provides a clear visual representation: Histograms are an effective way to represent large sets of data in a clear and concise manner. They allow you to quickly visualize the distribution of the data and identify patterns or outliers.
  • Easy to understand: Histograms are relatively simple to understand and do not require advanced statistical knowledge. They are a useful tool for communicating data to a wide range of audiences, including those without a background in statistics.
  • Allows for easy comparisons: Histograms make it easy to compare data sets and identify differences or similarities between them. Multiple histograms can be plotted on the same graph, making it easy to compare the distribution of different variables.
  • Provides information on central tendency and variability: The shape of the histogram can indicate whether the data is skewed or symmetrical, while the spread of the data can be estimated from the range of the bins.
  • Identifies outliers: Histograms can be used to identify outliers or unusual data points that fall outside the typical range of values. This can be particularly useful for detecting errors or anomalies in a dataset.