TD_CategoricalSummary Example | CategoricalSummary - Example: How to Use TD_CategoricalSummary - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
ft:locale
en-US
ft:lastEdition
2025-01-20
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantageā„¢

Every complete example in this document is available in a zip file that you can download. The zip file includes a SQL script file that creates the input tables for the examples. If you are reading this document on https://docs.teradata.com/, you can download the zip file from the attachment in the left sidebar.

TD_CategoricalSummary InputTable: cat_titanic_train

 passenger survived pclass name                                 gender age sibsp parch ticket   fare      cabin       embarked 
 --------- -------- ------ ------------------------------------ ------ --- ----- ----- -------- --------- ----------- -------- 
        97        0      1 Goldschmidt; Mr. George B            male    71     0     0 PC 17754 34.6542		A5          C       
       488        0      1 Kent; Mr. Edward Austin              male    58     0     0 11771    29.7		B37         C       
       505        1      1 Maioni; Miss. Roberta                female  16     0     0 110152   86.5		B79         S       
       631        1      1 Barkworth; Mr. Algernon Henry Wilson male    80     0     0 27042    30 		A23         S       
       873        0      1 Carlsson; Mr. Frans Olof             male    33     0     0 695      5 		B51 B53 B55 S

Example: TD_CategoricalSummary Call for Single Column

SELECT * FROM TD_CategoricalSummary (
  ON cat_titanic_train AS InputTable
  USING
  TargetColumns ('gender')
) AS dt;

TD_CategoricalSummary Output

 ColumnName DistinctValue DistinctValueCount 
 ---------- ------------- ------------------ 
 gender     female                         1
 gender     male                           4

Example: Calculating Statistics Using Multiple Columns

You can input multiple columns in the TargetColumns parameter to have these statistics calculated for all columns at the same time. embarked has been added.

SELECT * FROM TD_CategoricalSummary (
  ON cat_titanic_train AS InputTable
  USING
  TargetColumns ('embarked', 'gender')
) AS dt;

TD_Categorical Columns Output

 ColumnName DistinctValue DistinctValueCount 
 ---------- ------------- ------------------ 
 embarked   S                              3
 gender     female                         1
 embarked   C                              2
 gender     male                           4

Example: Calculating Statistics Using Indexing

Another way to run the query is to use indexing. Instead of writing column names, you can also use indexes. Indexes start from 0 (first column is indexed by 0) and then 1, and so on.

Input data for this (passengers_2):

name      gender    ticket     cabin    embarked
------    ------    --------   ------   --------
Harris    male      PC 17754   A3       C
Jim       male      PC 1754    A5       B
Taha      male      PC 1984    A8       C
Ahmed     male      PC 17754   A5       C
John      male      PC 1772    A8       A

Indexing Input Query

The reason to use this data is because TD_CategoricalSummary only works on categorical columns and this dataset only has categorical columns. The previous dataset had some integer and decimal datatype columns.

Running the following query gives the stats for the zeroth, first, and second columns:

SELECT * FROM TD_CategoricalSummary (
  ON passengers_2 AS InputTable
  USING
  TargetColumns ('[0:2]')
) AS dt;

Indexing Output

 ColumnName DistinctValue DistinctValueCount 
 ---------- ------------- ------------------ 
 name       Ahmed                          1
 name       Jim                            1
 name       Harris                         1
 name       John                           1
 ticket     PC 1772                        1
 ticket     PC 17754                       2
 gender     male                           5
 name       Taha                           1
 ticket     PC 1984                        1
 ticket     PC 1754                        1 

Statistics for All Columns Query

SELECT * FROM TD_CategoricalSummary (
  ON passengers_2 AS InputTable
  USING
  TargetColumns ('[:]')
) AS dt;

All Columns Output

 ColumnName DistinctValue DistinctValueCount 
 ---------- ------------- ------------------ 
 cabin      A3                             1
 name       Jim                            1
 cabin      A8                             2
 cabin      A5                             2
 name       Ahmed                          1
 ticket     PC 17754                       2
 embarked   B                              1
 embarked   A                              1
 ticket     PC 1772                        1
 name       Harris                         1
 embarked   C                              3
 ticket     PC 1984                        1
 gender     male                           5
 name       John                           1
 name       Taha                           1
 ticket     PC 1754                        1 

Using indexes in the TargetColumns parameter requires entering square brackets in quotes, and the entered column indexes need to be of the CHAR or VARCHAR datatype. If they are not, the query gives an error. You select all columns by entering the following [:] in quotes on the TargetColumns parameter.