Histograms are useful for assessing the shape of a data distribution. The Histogram function calculates the frequency distribution of a data set using either the Sturges or Scott algorithm to compute binning (bin width and number of bins). The bin width is the range for each group of values. Binning algorithms make strong assumptions about the shape of the distribution. Appropriate bin width depends on the actual data distribution and analysis goals. The function maps each input row to one bin and returns the row count (frequency) and percentage of rows (proportion) of each bin.
ML Engine histogram implementation includes these capabilities:
- User-selected or automatic bin determination
- User-selected left-inclusive or right-inclusive binning
- Multiple histograms for distinct groups