Syntax | Statistics Analysis | Vantage Analytics Library - Syntax - Vantage Analytics Library

Vantage Analytics Library User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
Lake
VMware
Product
Vantage Analytics Library
Release Number
2.2.0
Published
March 2023
Language
English (United States)
Last Update
2024-01-02
dita:mapPath
ibw1595473364329.ditamap
dita:ditavalPath
iup1603985291876.ditaval
dita:id
zyl1473786378775
Product Category
Teradata Vantage
CALL td_analyze (
  'statistics',
  'required_parameter_list [ optional_parameter; [...] ]'
);
required_parameter_list
database = input_database_name;
tablename = input_table_name;
columns = { column_name [,...] | keyword };
optional_parameter
{ columnstoexclude = column_name [,...] |
  extendedoptions = ext_option [,...] |
  gensqlonly = { true | false } |
  groupby = column_name [,...] |
  outputdatabase = output_database_name |
  outputtablename = output_table_name |
  overwrite = { true | false } |
  statisticalmethod = { population | sample } |
  statsoptions = stat_option [,...] |
  where = expression
}

Syntax Elements

database
The database containing the table to analyze.
tablename
The table containing the columns to analyze.
columns
The columns to analyze.
keyword Description
all All columns.
allnumeric All numeric columns.
allnumericanddate All numeric and date columns.
If you specify multiple columns:
  • The function builds a VOLATILE table, processing all columns in a single CREATE VOLATILE TABLE AS SELECT statement and reformatting data with individual INSERT/SELECT statements into the final output dataset.
  • tablename cannot specify a view.
columnstoexclude
[Optional] The columns to exclude when columns specifies a keyword.
extendedoptions
[Optional] The additional statistic or statistics to calculate for the specified column or columns.
ext_option Description
none (default) No additional statistics.
modes Modal (most frequently occurring value in column).
quantiles Bottom 10 percentiles, deciles, quartiles, tertiles, and top 10 percentiles of values in column.
values
  • Number of rows in column with non-NULL values.
  • Number of rows in column with NULL values.
  • Number of rows in column with unique values.
  • Number of rows in column with value 0.
  • Number of rows in column with positive values.
  • Number of rows in column with negative values.
  • Number of rows in column with blank values.
rank 5 smallest values in column, 5 largest values in column, and number of times that each of these values appears in column.
all modes, quantiles, values, and rank.
gensqlonly
[Optional] True returns the SQL for the function as a result set but does not run it.
False runs the SQL for the function but does not return it as a result set.
Default: false
groupby
[Optional] The input table columns for which to separately analyze each value or combination of values.
Default behavior: Input is not grouped.
outputdatabase
[Optional] The name of the database to contain the analysis results table.
outputtablename
[Optional] The name of the table to store the analysis results.
Default behavior: Function returns results as a result set but does not create an output table.
overwrite
[Optional] Whether to drop the output tables before creating new ones.
Default: true
statisticalmethod
[Optional] Whether to calculate the statistic by population or sample.
Default: population
statsoptions
[Optional] The statistic or statistics to calculate for the specified column or columns, either stat_option or its abbreviation.
stat_option

(Abbreviation)

Description Formula
statisticalmethod (population) (Default) statisticalmethod (sample)
all All of the following.  
count

(cnt)

Number of rows (observations) n with values for variable x. n
minimum

(min)

Minimum value in column.
Minimum value equation
maximum

(max)

Maximum value in column.
Maximum value equation
mean Arithmetic mean (average) of variable.

Mean value equation
standarddeviation

(std)

Standard deviation of variable.

Measures how widely values are dispersed from mean.


Standard deviation equation

Standard deviation (sample statistics selected)
skewness

(skew)

Skewness of variable.

Characterizes degree of asymmetry of distribution around mean. Positive skewness indicates distribution with asymmetric tail extending toward more positive values. Negative skewness indicates distribution with asymmetric tail extending toward more negative values.

Skewness is undefined when standard deviation of variable is 0 or count is less than 3.


Skewness equation

statisticalmethod (population): s is standard deviation of population.

statisticalmethod (sample): s is standard deviation of sample.

kurtosis

(kurt)

Kurtosis of variable.

Characterizes relative peakedness or flatness of distribution compared with normal distribution. Positive kurtosis indicates relatively peaked distribution. Negative kurtosis indicates relatively flat distribution.

Kurtosis is undefined when standard deviation of variable is 0 or count is less than 4.


Kurtosis equation

statisticalmethod (population): s is standard deviation of population.

statisticalmethod (sample): s is standard deviation of sample.

standarderror

(ste)

Standard error of variable.

Standard deviation divided by square root of count.


Standard error equation

Standard error (sample statistics selected)
coefficientofvariance

(cv)

Coefficient of variance of variable.

100 times standard deviation divided by mean.


Coefficient of variance equation

Coefficient of variance (sample statistics)
variance

(var)

Variance of variable.

Square of standard deviation.


Variance equation

Variance equation (sample statistics selected)
sum Sum of variable.
Sum equation
uncorrectedsumofsquares

(uss)

Uncorrected sum of squares of variable.

Uncorrected sum of squares equations
correctedsumofsquares

(css)

Corrected sum of squares of variable.

Corrected sum of squares equation
Default behavior: Function calculates only count, minimum, maximum, mean, and standarddeviation.
where
[Optional] The expression in the SQL WHERE clause to include in the generated SQL to filter rows selected for analysis.
Examples:
  • where = cust_id > 0 includes this WHERE clause in the generated SQL:
    WHERE cust_id > 0
  • where = gender = ''F'' includes this WHERE clause in the generated SQL:
    WHERE gender='F'