Purpose
Use a Values analysis as the first type of analysis performed on unknown data. Values analysis determines the nature and quality of the data. For example, whether the data is categorical or continuously numeric, how many null values it contains, and so on.
A Values analysis provides a count of rows, rows with non-null values, rows with null values, rows with value 0, rows with a positive value, rows with a negative value, and the number of rows containing blanks in the given column. By default, unique values are counted, but this calculation can be inhibited for performance reasons if desired.
For a column of non-numeric type, the zero, positive, and negative counts are always zero (for example, 000 is not counted as 0).
A Values analysis can be performed on columns of any data type, though the measures displayed vary according to column type.
Syntax
call twm. td_analyze('Values','database=twm_source;tablename=twm_customer_analysis;columns=income,marital_status;Optional Parameters;');Required Parameters
- columns
- The columns to analyze. Enter columns=all to analyze all columns in the input table.
For example: columns=column1,column2,column3 or columns=all
- database
- The database containing the table to analyze.
- tablename
- The table containing the columns to analyze.
- Values
- The Values parameter:
- Is required
- Must be the first parameter
- Is always enclosed in single quotes
Optional Parameters
- gensqlonly
- When true, the SQL for the requested function is returned as a result set but not run. When not specified or set to false, the SQL is run but not returned.
- groupby
-
If columns are specified with the groupby parameter, a separate analysis is performed for each value or combination of values in the specified columns. For example: groupby=gender,marital_status.
- outputdatabase
- Specifies the name of the database to contain the analysis results table.
- outputtablename
- Specifies the name of the table to store the analysis results. If not supplied, the results are returned as a result set.
- overwrite
-
When overwrite is set to true (default), the output tables are dropped before creating new ones.
- uniques
- The unique values count for each selected column when this parameter is set to true. By default, the uniques parameter is set to false and not counted.
- where
- Specifies the SQL WHERE clause generated within the Histogram SQL to filter rows selected for analysis. For example: where=cust_id > 0.
Examples
These examples demonstrate the invocation of the Values analysis.To run the provided examples, the td_analyze function must be installed in a database called twm and the TWM tutorial data must be installed in the twm_source database.
The first example uses a minimal number of parameters and does not produce an output table.
call twm.td_analyze('Values','database=twm_source;tablename=twm_customer_analysis;columns=income,marital_status;');
The second example builds on the first example and produces an output table.
call twm.td_analyze('Values','database=twm_source;tablename=twm_customer_analysis;columns=income,marital_status;outputdatabase=twm_results;outputtablename=_twm_values_table;groupby=gender;where=cust_id > 0;');
This third example includes additional optional parameters.
call twm.td_analyze('Values','database=twm_source;tablename=twm_customer_analysis;columns=income,marital_status;outputdatabase=twm_results;outputtablename=_twm_values;groupby=gender;where=cust_id > 0;uniques=false;');