Required Syntax Elements for TD_Histogram - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905
ON clause for InputTable
Accept the InputTable clause.
MethodType
Specify the method for calculating the frequency distribution of the dataset:
Available Methods Description
Sturges

Sturges algorithm performs best if data is normally distributed and n is at least 30.

Algorithm for calculating bin width:

w = r/(1 + log2n)

where:

w = bin width

r = data value range

n = number of elements in dataset

Scott

Scott algorithm performs best on normally distributed data.

Algorithm for calculating bin width:

w = 3.49s/(n1/3)

where:

w = bin width

s = standard deviation of data values

n = number of elements in dataset

r = data value range

Number of bins: r/w

Variable-Width
  • Requires MinMax table, which specifies the minimum value and the maximum value of the bin.
  • If one target column is specified, specify the minimum value in column1, maximum value in column2, and label of the bin in column3.
  • If more than one target column is specified, specify ColumnName in column1, minimum value in column2, maximum value in column3, and the label of the bin in column4.
    The maximum number of bins cannot exceed 10000 per column.
Equal-Width Algorithm for calculating bin width:

w = (max - min)/k

where:

min = minimum value of the bins

max = maximum value of the bins

k = number of intervals into which algorithm divides dataset

Interval boundaries: min+w, min+2w, …, min+(k-1)w

  • Optional MinMax table.
  • If MinMax table is omitted, the TD_Histogram function internally computes the min value and max value from the input data for the target columns.
  • If MinMax table is specified, the user can specify in the following manner:
    • If one target column is specified, specify min value in column1 and max value in column2.
    • If more than one target column is specified, specify ColumnName in column1, min value in column2, and max value in column3.
TargetColumn
Specify the InputTable columns for which the histogram is to be calculated.
NBins
[Required with methods Equal-Width and Variable-Width] Specify the integer value that specifies the number of ranges or bins.
If only one value is specified, it is applied to all the target columns. Otherwise, the number of NBins values must be equal to the number of target columns.
The maximum NBins value is 10000.
Inclusion
Specify whether to include points on bin boundaries, in the bin to the left of the boundary or the bin to the right of the boundary.
If only one value is specified, it is applied to all the target columns. Otherwise, the number of Inclusion values must be equal to the number of target columns.
Default: left
GroupByColumns
Specify the names of the InputTable columns that contain the group values for binning.
This argument must not have columns that are already specified in TargetColumn.
This argument does not support range.
The maximum number of unique columns in the GroupByColumns argument is 2042.