Purpose
A Values analysis is often useful as the first type of analysis to perform on data that is relatively unknown to the analyst. It helps determine the nature and overall quality of the data. For example, whether the data is categorical or continuously numeric, how many null values it contains, etc.
A Values analysis provides a count of the number of rows, rows with non-null values, rows with null values, rows with value 0, rows with a positive value, rows with a negative value, and the number of rows containing blanks in the given column. By default, unique values are counted, but this calculation can be inhibited for performance reasons if desired.
Note that for a column of non-numeric type, the zero, positive and negative counts will always be zero (for example, 000 is not counted as 0).
A Values analysis can be performed on columns of any data type, though the measures displayed vary according to column type.
Syntax
call twm. td_analyze('Values','database=twm_source;tablename=twm_customer_analysis;columns=income,marital_status;Optional Parameters;');Required Parameters
- columns
- The columns to be analyzed.
- database
- The database containing the table to be analyzed.
- tablename
- The table containing the columns to be analyzed.
- Values
- The Values parameter:
- Is required
- Must be the first parameter
- Is always enclosed in single quotes
Optional Parameters
- groupby
-
If one or more columns are specified with the groupby parameter, a separate analysis is performed for each value or combination of values in the specified columns. For example: groupby=gender,marital_status.
- outputdatabase
- This parameter specifies the name of the database that will contain the analysis results table.
- outputtablename
- Specifies the name of the table that will store the analysis results. If this parameter is not supplied, the results are returned as a result set.
- uniques
- When set to true, unique values are counted for each selected column. By default, unique values are set to true and not counted.
- where
- Option to have the specified SQL WHERE clause generated within the Histogram SQL to filter rows selected for analysis. For example: where=cust_id > 0.
Examples
The examples in this section demonstrate the invocation of the Values analysis.These examples assume that the td_analyze function has been installed in a database named twm.
The first example uses a minimal number of parameters and does not produce an output table.
call twm.td_analyze('Values','database=twm_source;tablename=twm_customer_analysis;columns=income,marital_status;');
The second example builds on the first example and produces an output table.
call twm.td_analyze('Values','database=twm_source;tablename=twm_customer_analysis;columns=income,marital_status;outputdatabase=twm_results;outputtablename=_twm_values;groupby=gender;where=cust_id > 0;uniques=false;');
This third example includes additional optional parameters.
call twm.td_analyze('Values','database=twm_source;tablename=twm_customer_analysis;columns=income,marital_status;outputdatabase=twm_results;outputtablename=_twm_values_table;groupby=gender;where=cust_id > 0;');