Syntax | Histogram | Vantage Analytics Library - Syntax - Vantage Analytics Library

Vantage Analytics Library User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
Lake
VMware
Product
Vantage Analytics Library
Release Number
2.2.0
Published
March 2023
Language
English (United States)
Last Update
2024-01-02
dita:mapPath
ibw1595473364329.ditamap
dita:ditavalPath
iup1603985291876.ditaval
dita:id
zyl1473786378775
Product Category
Teradata Vantage
CALL td_analyze (
  'histogram',
  'required_parameter_list [ optional_parameter; [...] ]'
);
required_parameter_list
database = input_database_name;
tablename = input_table_name;
columns = { column_name [,...] | keyword };
optional_parameter
{ binstyle = bin_style |
  columnstoexclude = column_name [,...] |
  gensqlonly = { true | false } |
  outputdatabase = output_database_name |
  outputtablename = output_table_name |
  overlaycolumns = column_name [,...] |
  overwrite = { true | false } |
  statisticscolumns = column_name [,...] |
  style = { basic | crosstab } |
  where = expression
}

Syntax Elements

database
The database containing the table to analyze.
tablename
The table containing the columns to analyze.
columns
The columns to analyze.
keyword Description
all All columns.
allnumeric All numeric columns.
allnumericanddate All numeric and date columns.
binstyle
[Optional] The bin style, which is one of the following options:
Option Description
bins=n[,…] Create n bins of equal width.

If you specify multiple columns, you can specify the number of bins for each column.

If you specify fewer n values than columns, leftover columns have default number of bins, 10.

widths=n[,…] Create bins of width n.

If you specify multiple columns, you must specify a width for each column.

If you specify fewer n values than columns, an error message displays.

quantiles=n[,…] Create approximately n equally populated bins.

If you specify multiple columns, you can specify the approximate number of equally populated bins for each column.

If you specify fewer n values than columns, leftover columns have default number of equally populated bins, 10.

boundaries=n[,…]

or

boundaries={n[,…]}...

Create bins with boundaries at specified numbers.

Example: boundaries=0,50,100,150 creates 3 bins between 0 and 150 (0 to 50, 50 to 100, and 100 to 150).

If you specify multiple columns, you must specify boundaries for each column. Example: boundaries={0, 50000, 100000, 150000}, {0, 50, 100}

binwithminmax=n, min, max

or

boundaries={n, min, max}...

Create n bins ranging from min to max.

Example: binwithminmax=5,0,200 creates 5 bins ranging from 0 to 200.

If you specify multiple columns, you must specify n, min, and max for each column. Example: binwithminmax={10, 0, 200000}, {5, 0, 100}

For each bin, beginning range values are inclusive and ending range values are exclusive, with these exceptions:
  • The last ending range value is inclusive.
  • If you request equally populated bins, ending range values are inclusive, because they are the maximum values in the ranges.
  • If you request equal-width or specified-width bins, beginning and ending range values of bins of a date type column may be truncated to a whole date value. A truncated ending range value is inclusive. A truncated beginning range value is exclusive. Because truncation may not be obvious, beginning and ending date range values for equal-width or specified-width bins are approximate.
Default: bins=10
columnstoexclude
[Optional] The columns to exclude when columns specifies a keyword.
gensqlonly
[Optional] True returns the SQL for the function as a result set but does not run it.
False runs the SQL for the function but does not return it as a result set.
Default: false
outputdatabase
[Optional] The name of the database to contain the analysis results table.
outputtablename
[Optional] The name of the table to store the analysis results.
Default behavior: Function returns results as a result set but does not create an output table.
overlaycolumns
[Optional] The column or columns for which to calculate frequencies within each bin for each value. Each specified column must be a categorical variable with few values, and statisticscolumns cannot also specify it.
If you specify multiple columns, the function calculates frequencies for crosstabs of values.
overwrite
[Optional] Whether to drop the output tables before creating new ones.
Default: true
statisticscolumns
[Optional] The column or columns for which to calculate simple statistics (minimum, maximum, mean and standard deviation) in each bin. Each specified column must be numeric (not DATE), and overlaycolumns cannot also specify it.
style
[Optional] The histogram style:
Histogram Style Description
basic (default) Histogram for individual columns.

If columns specifies multiple columns, function repeats SELECT statement for each column.

crosstab Multidimensional histogram (combines columns).

If columns specifies multiple columns, function cross-tabulates all columns within single SELECT statement. If you also specify outputtablename, function creates output table only once, repeating INSERT SELECT statement.

where
[Optional] The expression in the SQL WHERE clause to include in the generated SQL to filter rows selected for analysis.
Examples:
  • where = cust_id > 0 includes this WHERE clause in the generated SQL:
    WHERE cust_id > 0
  • where = gender = ''F'' includes this WHERE clause in the generated SQL:
    WHERE gender='F'