Syntax | Data Explorer | Vantage Analytics Library - Syntax

CALL td_analyze (
  'dataexplorer',
  'required_parameter_list [ optional_parameter; [...] ]'
);

required_parameter_list

database = input_database_name [,...];
tablename = input_table_name [,...];
outputdatabase = output_database_name;

optional_parameter

{ bins = number_of_bins |
  binstyle = { bins | quantiles } |
  columns = column_name [,...] |
  frequencyoutputtablename = freq_output_table |
  gensql = { true | false } |
  histogramoutputtablename = hist_output_table |
  maxuniquecharvalues = max_unique_char_values |
  maxuniquenumvalues = max_unique_num_values |
  overwrite = { true | false } |
  restrictedfreqproc = { true | false } |
  restrictedthreshold = threshold |
  statisticalmethod = { population | sample } |
  statisticsoutputtablename = stat_output_table |
  statsoptions = { all | stat_option [,...] } |
  uniques = { true | false } |
  valuesoutputtablename = values_output_table |
  where = expression
}

Syntax Elements

database: The database or databases containing the table or tables to analyze.; The number of databases specified by database and the number of tables specified by tablename must be the same. They are matched by position, and you can repeat the database names. For example, database=d1,d1,d2 and tablename=t1,t2,t3 specifies that the tables to analyze are d1.t1, d1.t2, and d2.t3 Tables t1 and t2 are in database d1.
tablename: The table or tables containing the columns to analyze.; No table can be a view.
outputdatabase: The database in which to store the output tables specified by frequencyoutputtablename, histogramoutputtablename, statisticsoutputtablename, and valuesoutputtablename.

bins

[Optional] The number of bins to create in the histogram.

Default: 10

binstyle

[Optional] The bin style, which is one of the following options:

Option	Description
bins=n[,…]	Create n bins of equal width. If you specify multiple columns, you can specify the number of bins for each column. If you specify fewer n values than columns, leftover columns have default number of bins, 10.
widths=n[,…]	Create bins of width n. If you specify multiple columns, you must specify a width for each column. If you specify fewer n values than columns, an error message displays.
quantiles=n[,…]	Create approximately n equally populated bins. If you specify multiple columns, you can specify the approximate number of equally populated bins for each column. If you specify fewer n values than columns, leftover columns have default number of equally populated bins, 10.
boundaries=n[,…] or boundaries={n[,…]}...	Create bins with boundaries at specified numbers. Example: boundaries=0,50,100,150 creates 3 bins between 0 and 150 (0 to 50, 50 to 100, and 100 to 150). If you specify multiple columns, you must specify boundaries for each column. Example: boundaries={0, 50000, 100000, 150000}, {0, 50, 100}
binwithminmax=n, min, max or boundaries={n, min, max}...	Create n bins ranging from min to max. Example: binwithminmax=5,0,200 creates 5 bins ranging from 0 to 200. If you specify multiple columns, you must specify n, min, and max for each column. Example: binwithminmax={10, 0, 200000}, {5, 0, 100}

Option

Description

bins=n[,…]

Create n bins of equal width.

If you specify multiple columns, you can specify the number of bins for each column.

If you specify fewer n values than columns, leftover columns have default number of bins, 10.

widths=n[,…]

Create bins of width n.

If you specify multiple columns, you must specify a width for each column.

If you specify fewer n values than columns, an error message displays.

quantiles=n[,…]

Create approximately n equally populated bins.

If you specify multiple columns, you can specify the approximate number of equally populated bins for each column.

If you specify fewer n values than columns, leftover columns have default number of equally populated bins, 10.

boundaries=n[,…]

boundaries={n[,…]}...

Create bins with boundaries at specified numbers.

Example: boundaries=0,50,100,150 creates 3 bins between 0 and 150 (0 to 50, 50 to 100, and 100 to 150).

If you specify multiple columns, you must specify boundaries for each column. Example: boundaries={0, 50000, 100000, 150000}, {0, 50, 100}

binwithminmax=n, min, max

boundaries={n, min, max}...

Create n bins ranging from min to max.

Example: binwithminmax=5,0,200 creates 5 bins ranging from 0 to 200.

If you specify multiple columns, you must specify n, min, and max for each column. Example: binwithminmax={10, 0, 200000}, {5, 0, 100}

For each bin, beginning range values are inclusive and ending range values are exclusive, with these exceptions:

The last ending range value is inclusive.
If you request equally populated bins, ending range values are inclusive, because they are the maximum values in the ranges.
If you request equal-width or specified-width bins, beginning and ending range values of bins of a date type column may be truncated to a whole date value. A truncated ending range value is inclusive. A truncated beginning range value is exclusive. Because truncation may not be obvious, beginning and ending date range values for equal-width or specified-width bins are approximate.

Default: bins=10

columns

[Optional] The column or columns to analyze.

frequencyoutputtablename

[Optional] The name of the frequency output table. If you do not want to overwrite previous results, specify a unique name.

Requires valuesoutputtablename.

Default: TwmExploreFrequency

gensql

[Optional] True returns the SQL for the function as a result set after initiating, up to a limited number of characters.

False runs the SQL for the function but does not return the SQL as a result set.

gensql differs from the gensqlonly parameter of other functions where the generated SQL is not run.

Default: false

histogramoutputtablename

[Optional] The name of the histogram output table. If you do not want to overwrite previous results, specify a unique name.

Requires valuesoutputtablename and statisticsoutputtablename.

Default: TwmExploreHistogram

maxuniquecharvalues

[Optional] The maximum number of unique character values for unrestricted frequency analysis.

Default: 100

maxuniquenumvalues

[Optional] The maximum number of unique numeric or date values for frequency analysis.

Default: 20

overwrite

[Optional] Whether to drop the output tables before creating new ones.

Default: true

restrictedfreqproc

[Optional] Whether to perform restricted Frequency analysis on character columns with more unique values than maxuniquecharvalues specifies. That is, whether to include their prominent values in the frequency output table.

A prominent value appears in the minimum percentage of rows that restrictedthreshold specifies if the ratio of unique values to rows is less than 100-threshold%.

Default: true

restrictedthreshold

[Optional] The minimum percentage of rows in which a value must occur to be included in the results.

Default: 1

statisticalmethod

[Optional] Whether to calculate the statistic by population or sample.

Default: population

statisticsoutputtablename

[Optional] The name of the statistics output table. If you do not want to overwrite previous results, specify a unique name.

Default: TwmExploreStatistics

statsoptions

[Optional] The statistic or statistics to calculate for the specified column or columns, either stat_option or its abbreviation.

stat_option (Abbreviation)	Description	Formula
stat_option (Abbreviation)	Description	statisticalmethod (population) (Default)	statisticalmethod (sample)
all	All of the following.
count (cnt)	Number of rows (observations) n with values for variable x.	n
minimum (min)	Minimum value in column.
maximum (max)	Maximum value in column.
mean	Arithmetic mean (average) of variable.
standarddeviation (std)	Standard deviation of variable. Measures how widely values are dispersed from mean.
skewness (skew)	Skewness of variable. Characterizes degree of asymmetry of distribution around mean. Positive skewness indicates distribution with asymmetric tail extending toward more positive values. Negative skewness indicates distribution with asymmetric tail extending toward more negative values. Skewness is undefined when standard deviation of variable is 0 or count is less than 3.	statisticalmethod (population): s is standard deviation of population. statisticalmethod (sample): s is standard deviation of sample.
kurtosis (kurt)	Kurtosis of variable. Characterizes relative peakedness or flatness of distribution compared with normal distribution. Positive kurtosis indicates relatively peaked distribution. Negative kurtosis indicates relatively flat distribution. Kurtosis is undefined when standard deviation of variable is 0 or count is less than 4.	statisticalmethod (population): s is standard deviation of population. statisticalmethod (sample): s is standard deviation of sample.
standarderror (ste)	Standard error of variable. Standard deviation divided by square root of count.
coefficientofvariance (cv)	Coefficient of variance of variable. 100 times standard deviation divided by mean.
variance (var)	Variance of variable. Square of standard deviation.
sum	Sum of variable.
uncorrectedsumofsquares (uss)	Uncorrected sum of squares of variable.
correctedsumofsquares (css)	Corrected sum of squares of variable.

Default behavior: Function calculates only count, minimum, maximum, mean, and standarddeviation.

uniques

[Optional] Whether to count unique values for each selected column.

Default: false

valuesoutputtablename

[Optional] The name of the values output table. If you do not want to overwrite previous results, specify a unique name.

Default: TwmExploreValues

where

[Optional] The expression in the SQL WHERE clause to include in the generated SQL to filter rows selected for analysis.

Examples:

where = cust_id > 0 includes this WHERE clause in the generated SQL:
```
WHERE cust_id > 0
```
where = gender = ''F'' includes this WHERE clause in the generated SQL:
```
WHERE gender='F'
```
Use this form of the where parameter to request a WHERE clause for each input table. The WHERE clause must match positionally with the tablename and database parameters with each WHERE clause in braces {} as in the following example:
```
where=
	{cust_id < 1362490},
	{cust_id < 1362490};
```

Syntax | Data Explorer | Vantage Analytics Library - Syntax - Vantage Analytics Library

Vantage Analytics Library User Guide

Syntax Elements