CALL td_analyze (
'adaptivehistogram',
'required_parameter_list [ optional_parameter; [...] ]'
);
- required_parameter_list
database = input_database_name;
tablename = input_table_name;
columns = { column_name [,...] | keyword };
- optional_parameter
{ bins = bins [,...] |
columnstoexclude = column_name [,...] |
gensqlonly = { true | false } |
outputdatabase = output_database_name |
outputtablename = output_table_name |
overlaycolumns = column_name [,...] |
overwrite = { true | false } |
spikethreshold = spike_threshold |
subdivisionmethod = { means | quantiles } |
subdivisionthreshold = subdiv_threshold |
where = expression
}
Syntax Elements
- database
- The database containing the table to analyze.
- tablename
- The table containing the columns to analyze.
- columns
- The columns to analyze.
-
keyword |
Description |
all |
All columns. |
allnumeric |
All numeric columns. |
allnumericanddate |
All numeric and date columns. |
- bins
- [Optional] The number of equal width bins to create (for example, bins=5). If you specify multiple columns, you can specify multiple bin sizes (for example, bins=5, 10). If you specify fewer bin sizes than columns, leftover columns have the default size, 10.
- For each bin, beginning range values are inclusive and ending range values are exclusive, with these exceptions:
- The last ending range value is inclusive.
- The ending range value of a spike is inclusive (because the beginning and ending values of a spike are the same).
- The beginning range value of a bin that follows and adjoins a spike is exclusive (because this value is the same as the spike value).
- The ending range value of a quantile sub-bin is inclusive.
- columnstoexclude
- [Optional] The columns to exclude when columns specifies a keyword.
- gensqlonly
- [Optional] True returns the SQL for the function as a result set but does not run it.
- False runs the SQL for the function but does not return it as a result set.
- Default: false
- outputdatabase
- [Optional] The name of the database to contain the analysis results table.
- outputtablename
- [Optional] The name of the table to store the analysis results.
- Default behavior: Function returns results as a result set but does not create an output table.
- overwrite
- [Optional] Whether to drop the output tables before creating new ones.
- Default: true
- spikethreshold
- [Optional] The percentage of rows, an integer in the range [1, 100], above which an individual value of a column is identified as a separate bin. Values with this or a larger percentage of rows are identified as a Spike.
- Default: 10% of the total number of rows
- subdivisionmethod
- [Optional] The method for subdividing bins with too many values:
Subdivision Method |
Description |
means |
Use range of +/- standard deviation around mean value in bin. |
quantiles |
Use quantiles, producing approximately equally distributed bins. |
- subdivisionthreshold
- [Optional] The percentage of rows, an integer in the range [1, 100], above which a bin is subdivided into sub-bins.
- Default: 30% of the total number of rows
- where
- [Optional] The expression in the SQL WHERE clause to include in the generated SQL to filter rows selected for analysis.
- Examples:
- where = cust_id > 0 includes this WHERE clause in the generated SQL:
WHERE cust_id > 0
- where = gender = ''F'' includes this WHERE clause in the generated SQL:
WHERE gender='F'