Data Explorer - Statistics Analysis - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 1Introduction and Profiling

Teradata Warehouse Miner
Release Number
July 2017
English (United States)
Last Update
Product Category

If requested, a Statistics analysis is performed on every requested column of numeric or date type. You may select the statistics to be calculated but the minimum, maximum, mean and standard deviation are always calculated if the Histogram analysis is selected. Other measures available include skewness, kurtosis, standard error, coefficient of variance, variance, sum, uncorrected sum of squares and corrected sum of squares. You may choose to compute sample statistics or population measures. For columns of type date, the minimum, maximum and mean dates are converted to integers that look like dates in a ‘YYYYMMDD’ style, such as 20020823 for 2002-08-23, and other measures such as standard deviation are computed in units of days. Sum and sum of squares measures for dates are in terms of days since 1900 and are presumably not very useful.

The general strategy in computing the Statistics function is to combine as many of the counts and measures for the various columns in as few Select statements as possible. Results from each Select statement are automatically placed in a temporary table; that is, each Select is actually an Insert-Select statement. The data for possibly multiple columns is then reorganized by way of Insert-Select statements that move each variable’s results one at a time into the final answer table.

In computing statistical measures, the Teradata aggregations for minimum, maximum, mean, standard deviation, skew, and kurtosis are used. When population measures are requested rather than sample statistics, formulas expressing population skew and population kurtosis in terms of their sample counterparts are used since these measures are not provided directly in Teradata.