- It can provide business insight in its own right.
- It uncovers data quality issues which, if not corrected or compensated for, can jeopardize the accuracy of any analytic models that are based on the data.
- It isolates the data that should be used in building analytic models. For example, outlying values should sometimes be excluded from a model; in other cases, these values might be required to solve a particular business problem. Further, some statistical processes used in analytic modeling require a certain type of distribution of data.
In the case of the Descriptive Statistics, NULL values are handled through the generated SQL’s aggregate functions. In this case, SQL ignores the NULL value and adjusts the number of observations in its calculation. This effectively provides a listwise deletion of NULL values.
- Adaptive Histogram — Determine the distribution of a numeric column(s) giving counts, sub-binning column(s) with higher counts and determining data spikes.
- Correlation Matrix — Build and view a correlation matrix.
- Data Explorer — Automated exploration of any number of tables or views within an entire database.
- Frequency — Compute frequency of column values or multi-column combined values. Optionally, compute frequency of values for pairs of columns in a single column list or two column lists, and generate simple statistics for any other column within a table.
- Histogram — Determine the distribution of a numeric column(s) giving counts with optional overlay counts and statistics.
- Overlap — Count overlapping column values in combinations of tables (i.e., find “key” values in common between tables).
- Scatter Plot — Plot sampled values of two to three variables in 2-D or 3-D.
Statistical Analysis — Determine any of the following descriptive statistics for numeric column(s):
- Minimum Value
- Maximum Value
- Mean Value
- Standard Deviation
- Standard Mean Error
- Coefficient of Variance
- Uncorrected Sums of squares
- Corrected Sums of squares
- Values Count
- Modal Value
- Top 5/Bottom 5 Rank and Values
- Text Field Analyzer — Analyze character data and help distinguish whether the field is a numeric type, a date, a time, a timestamp, or character data.
Values Analysis — Count the number of values of various kinds for a given column or columns, including:
- Number of Rows
- Rows with Non-NULL Values
- Rows with NULL Values
- Unique Values
- Rows with Value ‘0’
- Rows with a Positive Value
- Rows with a Negative Value
- Rows Containing Blank Values
In order to add a Descriptive Statistical analysis to a Teradata Warehouse Miner Data Mining Project, create a new analysis with any of the mechanisms described in Using Teradata Warehouse Miner. This will produce the following dialog.Add New Analysis dialog
Double-click or highlight the desired analysis, optionally change the default name and click OK. Each of these specific analyses are described in detail in the subsequent sections.