TD_ColumnSummary is a function where the contents of a column are grouped and presented with respect to certain standard values. Summarizing a column in terms of positive, negative, null, and other similar values can be useful in several ways:
- Data Cleaning: Summarizing the column in terms of null and not null values can identify missing data, which is important for data cleaning and analysis. Depending on the nature of the analysis, missing data can either be removed or imputed.
- Data Exploration: Summarizing the column in terms of positive and negative values can identify trends or patterns in the data. For example, if the column represents the profit of a business, a large number of negative values may indicate that the business is not performing well.
- Data Validation: Summarizing the column in terms of positive and negative values can help identify errors or anomalies in the data. For example, if the column represents the height of people, a negative value is impossible and would indicate an error in the data.
- Reporting: Summarizing the column in terms of positive, negative, null, and not null values can provide a quick summary of the data that can be included in reports or presentations.
In general, summarizing a column in terms of positive, negative, null, not null, zeros, blanks and their respective percentages can help provide a quick overview of the data and identify potential issues or trends.
A column can be summarized with respect to the values it contains. To summarize a given column as blank values, zeros, null, not null, positive, negative and their respective percentages, follow the steps:
- Count the total number of values (n) in the column.
- Count the number of blank values (b) in the column.
- Count the number of zero values (z) in the column.
- Count the number of null values (null) in the column.
- Count the number of not null values (n - null) in the column.
- Count the number of positive values (pos) in the column.
- Count the number of negative values (neg) in the column.
- Calculate the percentage of blank values (b/n * 100) in the column.
- Calculate the percentage of zero values (z/n * 100) in the column.
- Calculate the percentage of null values (null/n * 100) in the column.
- Calculate the percentage of positive values (pos/n * 100) in the column.
- Calculate the percentage of negative values (neg/n * 100) in the column.
Such a summary is used for exploratory data analysis of each column in a given database and its respective table. A correlation of the columns in terms of their summary can direct us towards their dependency on each other.
TD_ColumnSummary displays the following for each specified input table column:
- Column name
- Column data type
- Count of these values:
- Non-NULL
- NULL
- Blank (all space characters) (NULL for numeric data type)
- Zero (NULL for nonnumeric data type)
- Positive (NULL for nonnumeric data type)
- Negative (NULL for nonnumeric data type)
- Percentage of NULL values
- Percentage of non-NULL values