Syntax | Statistics Analysis | Vantage Analytics Library - Syntax

CALL td_analyze (
  'statistics',
  'required_parameter_list [ optional_parameter; [...] ]'
);

required_parameter_list

database = input_database_name;
tablename = input_table_name;
columns = { column_name [,...] | keyword };

optional_parameter

{ columnstoexclude = column_name [,...] |
  extendedoptions = ext_option [,...] |
  gensqlonly = { true | false } |
  groupby = column_name [,...] |
  outputdatabase = output_database_name |
  outputtablename = output_table_name |
  overwrite = { true | false } |
  statisticalmethod = { population | sample } |
  statsoptions = stat_option [,...] |
  where = expression
}

Syntax Elements

database

The database containing the table to analyze.

tablename

The table containing the columns to analyze.

columns

The columns to analyze.

keyword	Description
all	All columns.
allnumeric	All numeric columns.
allnumericanddate	All numeric and date columns.

If you specify multiple columns:

The function builds a VOLATILE table, processing all columns in a single CREATE VOLATILE TABLE AS SELECT statement and reformatting data with individual INSERT/SELECT statements into the final output dataset.
tablename cannot specify a view.

columnstoexclude

[Optional] The columns to exclude when columns specifies a keyword.

extendedoptions

[Optional] The additional statistic or statistics to calculate for the specified column or columns.

ext_option	Description
none (default)	No additional statistics.
modes	Modal (most frequently occurring value in column).
quantiles	Bottom 10 percentiles, deciles, quartiles, tertiles, and top 10 percentiles of values in column.
values	Number of rows in column with non-NULL values. Number of rows in column with NULL values. Number of rows in column with unique values. Number of rows in column with value 0. Number of rows in column with positive values. Number of rows in column with negative values. Number of rows in column with blank values.
rank	5 smallest values in column, 5 largest values in column, and number of times that each of these values appears in column.
all	modes, quantiles, values, and rank.

gensqlonly

[Optional] True returns the SQL for the function as a result set but does not run it.

False runs the SQL for the function but does not return it as a result set.

Default: false

groupby

[Optional] The input table columns for which to separately analyze each value or combination of values.

Default behavior: Input is not grouped.

outputdatabase

[Optional] The name of the database to contain the analysis results table.

outputtablename

[Optional] The name of the table to store the analysis results.

Default behavior: Function returns results as a result set but does not create an output table.

overwrite

[Optional] Whether to drop the output tables before creating new ones.

Default: true

statisticalmethod

[Optional] Whether to calculate the statistic by population or sample.

Default: population

statsoptions

[Optional] The statistic or statistics to calculate for the specified column or columns, either stat_option or its abbreviation.

stat_option (Abbreviation)	Description	Formula
stat_option (Abbreviation)	Description	statisticalmethod (population) (Default)	statisticalmethod (sample)
all	All of the following.
count (cnt)	Number of rows (observations) n with values for variable x.	n
minimum (min)	Minimum value in column.
maximum (max)	Maximum value in column.
mean	Arithmetic mean (average) of variable.
standarddeviation (std)	Standard deviation of variable. Measures how widely values are dispersed from mean.
skewness (skew)	Skewness of variable. Characterizes degree of asymmetry of distribution around mean. Positive skewness indicates distribution with asymmetric tail extending toward more positive values. Negative skewness indicates distribution with asymmetric tail extending toward more negative values. Skewness is undefined when standard deviation of variable is 0 or count is less than 3.	statisticalmethod (population): s is standard deviation of population. statisticalmethod (sample): s is standard deviation of sample.
kurtosis (kurt)	Kurtosis of variable. Characterizes relative peakedness or flatness of distribution compared with normal distribution. Positive kurtosis indicates relatively peaked distribution. Negative kurtosis indicates relatively flat distribution. Kurtosis is undefined when standard deviation of variable is 0 or count is less than 4.	statisticalmethod (population): s is standard deviation of population. statisticalmethod (sample): s is standard deviation of sample.
standarderror (ste)	Standard error of variable. Standard deviation divided by square root of count.
coefficientofvariance (cv)	Coefficient of variance of variable. 100 times standard deviation divided by mean.
variance (var)	Variance of variable. Square of standard deviation.
sum	Sum of variable.
uncorrectedsumofsquares (uss)	Uncorrected sum of squares of variable.
correctedsumofsquares (css)	Corrected sum of squares of variable.

Default behavior: Function calculates only count, minimum, maximum, mean, and standarddeviation.

where

[Optional] The expression in the SQL WHERE clause to include in the generated SQL to filter rows selected for analysis.

Examples:

where = cust_id > 0 includes this WHERE clause in the generated SQL:
```
WHERE cust_id > 0
```
where = gender = ''F'' includes this WHERE clause in the generated SQL:
```
WHERE gender='F'
```

Syntax | Statistics Analysis | Vantage Analytics Library - Syntax - Vantage Analytics Library

Vantage Analytics Library User Guide

Syntax Elements