Description
Histograms are useful for assessing the shape of a data distribution. The Histogram function calculates the frequency distribution of a data set using sophisticated binning techniques that can automatically calculate the bin width and number of bins. The function maps each input row to one bin and returns the frequency (row count) and proportion (percentage of rows) of each bin.
Usage
td_histogram_mle ( data = NULL, auto.bin = NULL, custom.bin.table = NULL, custom.bin.column = NULL, bin.size = NULL, start.value = NULL, end.value = NULL, value.column = NULL, inclusion = "left", groupby.columns = NULL, data.sequence.column = NULL, custom.bin.table.sequence.column = NULL )
Arguments
data |
Required Argument. |
auto.bin |
Optional Argument. |
custom.bin.table |
Optional Argument. |
custom.bin.column |
Optional Argument. |
bin.size |
Optional Argument. |
start.value |
Optional Argument. |
end.value |
Optional Argument. |
value.column |
Required Argument. |
inclusion |
Optional Argument. |
groupby.columns |
Optional Argument. |
data.sequence.column |
Optional Argument. |
custom.bin.table.sequence.column |
Optional Argument. |
Value
Function returns an object of class "td_histogram_mle" which is a named
list containing Teradata tbl objects.
Named list members can be referenced directly with the "$" operator
using following names:
output.table
output
Examples
library(ggplot2) # Get the current context/connection con <- td_get_context()$connection # Load example data. loadExampleData("histogram_example", "cars_hist", "bin_breaks") # The cars_hist table has the cylinder (cyl) and horsepower (hp) data for different car models. cars_hist <- tbl(con, "cars_hist") # The bin_breaks table has the boundary values for the custom bins to be used while generating the histogram bin_breaks <- tbl(con, "bin_breaks") # Example 1 - Generate histogram based on the cars horsepower using STURGES rule. td_histogram_out <- td_histogram_mle(data = cars_hist, auto.bin = "Sturges", value.column = "hp" ) # Plot showing the percentage of cars in each histogram bin ggplot(as.data.frame(td_histogram_out$output.table), aes(x=bin_end, y=bin_percent)) + geom_bar(stat = "identity", fill = "#FF6666") + labs(x="Horsepower", y="Percentage") # Example 2 - Generate histogram based on the cars horsepower by setting custom bin size, start and end values. td_histogram_out <- td_histogram_mle(data = cars_hist, bin.size = 50, start.value = 20, end.value = 400, value.column = "hp", inclusion = "right", groupby.columns = c("cyl") ) # Example 3 - Generate histogram using custom bins from a custom table. Here cylinder (cyl) column is also used to group the input data. td_histogram_out <- td_histogram_mle(data = cars_hist, custom.bin.table = bin_breaks, custom.bin.column = "break_values", value.column = "hp", groupby.columns = c("cyl") )