Syntax | Data Explorer | Vantage Analytics Library - Syntax - Vantage Analytics Library

Vantage Analytics Library User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
Lake
VMware
Product
Vantage Analytics Library
Release Number
2.2.0
Published
March 2023
Language
English (United States)
Last Update
2024-01-02
dita:mapPath
ibw1595473364329.ditamap
dita:ditavalPath
iup1603985291876.ditaval
dita:id
zyl1473786378775
Product Category
Teradata Vantage
CALL td_analyze (
  'dataexplorer',
  'required_parameter_list [ optional_parameter; [...] ]'
);
required_parameter_list
database = input_database_name [,...];
tablename = input_table_name [,...];
outputdatabase = output_database_name;
optional_parameter
{ bins = number_of_bins |
  binstyle = { bins | quantiles } |
  columns = column_name [,...] |
  frequencyoutputtablename = freq_output_table |
  gensql = { true | false } |
  histogramoutputtablename = hist_output_table |
  maxuniquecharvalues = max_unique_char_values |
  maxuniquenumvalues = max_unique_num_values |
  overwrite = { true | false } |
  restrictedfreqproc = { true | false } |
  restrictedthreshold = threshold |
  statisticalmethod = { population | sample } |
  statisticsoutputtablename = stat_output_table |
  statsoptions = { all | stat_option [,...] } |
  uniques = { true | false } |
  valuesoutputtablename = values_output_table |
  where = expression
}

Syntax Elements

database
The database or databases containing the table or tables to analyze.
The number of databases specified by database and the number of tables specified by tablename must be the same. They are matched by position, and you can repeat the database names. For example, database=d1,d1,d2 and tablename=t1,t2,t3 specifies that the tables to analyze are d1.t1, d1.t2, and d2.t3 Tables t1 and t2 are in database d1.
tablename
The table or tables containing the columns to analyze.
No table can be a view.
outputdatabase
The database in which to store the output tables specified by frequencyoutputtablename, histogramoutputtablename, statisticsoutputtablename, and valuesoutputtablename.
bins
[Optional] The number of bins to create in the histogram.
Default: 10
binstyle
[Optional] The bin style, which is one of the following options:
Option Description
bins=n[,…] Create n bins of equal width.

If you specify multiple columns, you can specify the number of bins for each column.

If you specify fewer n values than columns, leftover columns have default number of bins, 10.

widths=n[,…] Create bins of width n.

If you specify multiple columns, you must specify a width for each column.

If you specify fewer n values than columns, an error message displays.

quantiles=n[,…] Create approximately n equally populated bins.

If you specify multiple columns, you can specify the approximate number of equally populated bins for each column.

If you specify fewer n values than columns, leftover columns have default number of equally populated bins, 10.

boundaries=n[,…]

or

boundaries={n[,…]}...

Create bins with boundaries at specified numbers.

Example: boundaries=0,50,100,150 creates 3 bins between 0 and 150 (0 to 50, 50 to 100, and 100 to 150).

If you specify multiple columns, you must specify boundaries for each column. Example: boundaries={0, 50000, 100000, 150000}, {0, 50, 100}

binwithminmax=n, min, max

or

boundaries={n, min, max}...

Create n bins ranging from min to max.

Example: binwithminmax=5,0,200 creates 5 bins ranging from 0 to 200.

If you specify multiple columns, you must specify n, min, and max for each column. Example: binwithminmax={10, 0, 200000}, {5, 0, 100}

For each bin, beginning range values are inclusive and ending range values are exclusive, with these exceptions:
  • The last ending range value is inclusive.
  • If you request equally populated bins, ending range values are inclusive, because they are the maximum values in the ranges.
  • If you request equal-width or specified-width bins, beginning and ending range values of bins of a date type column may be truncated to a whole date value. A truncated ending range value is inclusive. A truncated beginning range value is exclusive. Because truncation may not be obvious, beginning and ending date range values for equal-width or specified-width bins are approximate.
Default: bins=10
columns
[Optional] The column or columns to analyze.
frequencyoutputtablename
[Optional] The name of the frequency output table. If you do not want to overwrite previous results, specify a unique name.
Requires valuesoutputtablename.
Default: TwmExploreFrequency
gensql
[Optional] True returns the SQL for the function as a result set after initiating, up to a limited number of characters.
False runs the SQL for the function but does not return the SQL as a result set.
gensql differs from the gensqlonly parameter of other functions where the generated SQL is not run.
Default: false
histogramoutputtablename
[Optional] The name of the histogram output table. If you do not want to overwrite previous results, specify a unique name.
Requires valuesoutputtablename and statisticsoutputtablename.
Default: TwmExploreHistogram
maxuniquecharvalues
[Optional] The maximum number of unique character values for unrestricted frequency analysis.
Default: 100
maxuniquenumvalues
[Optional] The maximum number of unique numeric or date values for frequency analysis.
Default: 20
overwrite
[Optional] Whether to drop the output tables before creating new ones.
Default: true
restrictedfreqproc
[Optional] Whether to perform restricted Frequency analysis on character columns with more unique values than maxuniquecharvalues specifies. That is, whether to include their prominent values in the frequency output table.
A prominent value appears in the minimum percentage of rows that restrictedthreshold specifies if the ratio of unique values to rows is less than 100-threshold%.
Default: true
restrictedthreshold
[Optional] The minimum percentage of rows in which a value must occur to be included in the results.
Default: 1
statisticalmethod
[Optional] Whether to calculate the statistic by population or sample.
Default: population
statisticsoutputtablename
[Optional] The name of the statistics output table. If you do not want to overwrite previous results, specify a unique name.
Default: TwmExploreStatistics
statsoptions
[Optional] The statistic or statistics to calculate for the specified column or columns, either stat_option or its abbreviation.
stat_option

(Abbreviation)

Description Formula
statisticalmethod (population) (Default) statisticalmethod (sample)
all All of the following.  
count

(cnt)

Number of rows (observations) n with values for variable x. n
minimum

(min)

Minimum value in column.
Minimum value equation
maximum

(max)

Maximum value in column.
Maximum value equation
mean Arithmetic mean (average) of variable.

Mean value equation
standarddeviation

(std)

Standard deviation of variable.

Measures how widely values are dispersed from mean.


Standard deviation equation

Standard deviation (sample statistics selected)
skewness

(skew)

Skewness of variable.

Characterizes degree of asymmetry of distribution around mean. Positive skewness indicates distribution with asymmetric tail extending toward more positive values. Negative skewness indicates distribution with asymmetric tail extending toward more negative values.

Skewness is undefined when standard deviation of variable is 0 or count is less than 3.


Skewness equation

statisticalmethod (population): s is standard deviation of population.

statisticalmethod (sample): s is standard deviation of sample.

kurtosis

(kurt)

Kurtosis of variable.

Characterizes relative peakedness or flatness of distribution compared with normal distribution. Positive kurtosis indicates relatively peaked distribution. Negative kurtosis indicates relatively flat distribution.

Kurtosis is undefined when standard deviation of variable is 0 or count is less than 4.


Kurtosis equation

statisticalmethod (population): s is standard deviation of population.

statisticalmethod (sample): s is standard deviation of sample.

standarderror

(ste)

Standard error of variable.

Standard deviation divided by square root of count.


Standard error equation

Standard error (sample statistics selected)
coefficientofvariance

(cv)

Coefficient of variance of variable.

100 times standard deviation divided by mean.


Coefficient of variance equation

Coefficient of variance (sample statistics)
variance

(var)

Variance of variable.

Square of standard deviation.


Variance equation

Variance equation (sample statistics selected)
sum Sum of variable.
Sum equation
uncorrectedsumofsquares

(uss)

Uncorrected sum of squares of variable.

Uncorrected sum of squares equations
correctedsumofsquares

(css)

Corrected sum of squares of variable.

Corrected sum of squares equation
Default behavior: Function calculates only count, minimum, maximum, mean, and standarddeviation.
uniques
[Optional] Whether to count unique values for each selected column.
Default: false
valuesoutputtablename
[Optional] The name of the values output table. If you do not want to overwrite previous results, specify a unique name.
Default: TwmExploreValues
where
[Optional] The expression in the SQL WHERE clause to include in the generated SQL to filter rows selected for analysis.
Examples:
  • where = cust_id > 0 includes this WHERE clause in the generated SQL:
    WHERE cust_id > 0
  • where = gender = ''F'' includes this WHERE clause in the generated SQL:
    WHERE gender='F'
  • Use this form of the where parameter to request a WHERE clause for each input table. The WHERE clause must match positionally with the tablename and database parameters with each WHERE clause in braces {} as in the following example:
    where=
    	{cust_id < 1362490},
    	{cust_id < 1362490};