Data Explorer - INPUT - Analysis Parameters

Teradata Warehouse Miner User Guide - Volume 1Introduction and Profiling

brand
Software
prodname
Teradata Warehouse Miner
vrm_release
5.4.4
category
User Guide
featnum
B035-2300-077K
  1. On the Data Explorer dialog box, click on INPUT.
  2. Click on analysis parameters.
    Data Explorer > Input > Analysis Parameters

  3. On this screen, select:
    • Analyses to Perform
      • Values — Check box to include the Values analysis as part of the Data Explorer analysis execution.
        • Compute unique values for each column selected — By default, the Data Explorer Values analysis will not calculate the number of unique values within the column specified. Enabling this option adds that calculation to the analysis. Enabling this option is required in order to run a basic “unrestricted” frequency analysis. For more information, see Data Explorer - Frequency Analysis.
      • Statistics — Check box to include the Statistics analysis as part of the Data Explorer analysis execution. Each of the following basic univariate statistics are individually selectable for the analysis, except if Histogram is selected or Statistics graphs are desired. If either of these is the case, at least the Number of Values, Minimum Value, Maximum Value, Mean Value and Standard Deviation must be selected. By default, these same five calculations are selected when the Statistics option is enabled. The Check All and Clear All buttons can be used to enable or disable all options.
        See Statistical Analysis for the mathematical equations for each univariate statistic. The following options are available:
        • Number of Values (required for Statistics graphs) — Include a count of the total number of rows (observations) with values for the specified column.
        • Minimum Value (required for Statistics graphs) — Include the calculation for the smallest value of the column.
        • Maximum Value (required for Statistics graphs) — Include the calculation for the largest value of the column.
        • Mean Value (required for Statistics graphs) — Include the calculation for the average value of the column.
        • Standard Deviation (required for Statistics graphs) — Include the calculation for the standard deviation of the variable. The standard deviation is a measure of how widely values are dispersed from the average value (the mean). The measures change depending upon if Population or Sample Statistics are chosen.
        • Skewness — Include the calculation for skewness of the variable. The skewness of the variable is a characterization of the degree of asymmetry of a distribution around its mean. Positive skewness indicates a distribution with an asymmetric tail extending toward more positive values. Negative skewness indicates a distribution with an asymmetric tail extending toward more negative values.
          The measures for Skewness (and Kurtosis) that are provided by Teradata Warehouse Miner are also known as the “Fisher g statistics,” related to the “momental skewness and kurtosis” [D’Agostino, Belanger, and D’Agostino Jr.].
          The equation for Skewness changes depending on if Population or Sample Statistics are chosen. Note that skewness is undefined when either the standard deviation of the variable is equal to 0, or the number of occurrences is less than 3.
        • Kurtosis — Include the calculation for the kurtosis of the variable. The kurtosis of the variable is a characterization of the relative peakedness or flatness of a distribution compared with the normal distribution. Positive kurtosis indicates a relatively peaked distribution. Negative kurtosis indicates a relatively flat distribution.
          The measures for Kurtosis (and Skewness) that are provided by Teradata Warehouse Miner are also known as the “Fisher g statistics,” related to the “momental skewness and kurtosis” [D’Agostino, Belanger, and D’Agostino Jr.].
          The equation for Kurtosis changes depending on if Population or Sample Statistics are chosen. Note that kurtosis is undefined when either the standard deviation of the variable is equal to 0, or the number of occurrences is less than 4.
        • Standard Error — Include the calculation for the standard error of the variable. The standard error of the variable, calculated as the standard deviation divided by the square root of the number of occurrences. Different equations for calculating standard error are used depending on if Population Sample Statistics are chosen.
        • Coefficient of Variance — Include the calculation for the coefficient of variance of the variable. The coefficient of variance of the variable, calculated as 100 times the standard deviation divided by the mean. The equation for coefficient of variance changes depending on if Population or Sample Statistics are chosen. Note that coefficient of variance is undefined when the average of the variable is 0.
        • Variance — Include the calculation for the variance of the variable. The variance of the variable is calculated as the square of the standard deviation. The equation for Variance changes depending on if Population or Sample Statistics are chosen.
        • Sum — Include the calculation for the sum of the variable.
        • Uncorrected Sums of squares — Include the calculation for the uncorrected sums of squares of the variable.
        • Corrected Sums of squares — Include the calculation for the corrected sums of squares of the variable.
        • Statistical Method

          - Population — Use population statistics for the statistical calculations.

          - Sample — Use sample statistics for those statistical calculations where the calculation changes.

      • Frequency — Include a Frequency analysis in the execution of the Data Explorer. Note that selecting this option automatically enables a Values analysis. If this option is selected, either the Compute unique values for each column selected option under Values must be selected, or the Restricted Frequency Processing option on the expert options tab must be selected. See the description of the options in Data Explorer - INPUT - Expert Options for an explanation of those parameters that influence the Frequency analysis.
      • Histogram — Include a quantile or equally distributed Histogram analysis in the execution of the Data Explorer. Note that selecting this option automatically enables a Statistics analysis. See the description of options in Data Explorer - INPUT - Expert Options for an explanation of those parameters that influence the Histogram analysis.
        • Number of Bins — The total number of quantile or equally distributed bins. Defaults to 10.