A contingency table, also known as a two-way frequency table, is a tabular mechanism with at least two rows and two columns used in statistics to present categorical data in terms of frequency counts.
A contingency table shows the observed frequency of two variables arranged into rows and columns. A cell is the intersection of a row and a column of a contingency table.
For example, a cell count nij represents a joint occurrence of row i and column j where i is a value between 1 to r (total number of rows) and j is a value between 2 to c (total number of columns).
- The first column represents the first category, gender, which has two labels, female and male represented by two rows.
- The second and third columns represent the second category, habits, which has two labels, smokers and non-smokers.
The second category can have at most 2046 unique labels. The function ignores NULL values in the table.
Maximum label length is 64000 for category_1, 128 for all other columns.
For a valid test output, the value of each observed frequency in the CONTINGENCY table must be at least 5.
CONTINGENCY Table Schema
Column | Data Type | Description |
---|---|---|
Name of categorical column 1 | Any | Columns can have one or multiple labels. Can either be an INTEGER, LATIN, or UTF8 code. |
category_2_label_1 | INTEGER, SMALLINT, BYTEINT, or BIGINT | Joint frequency of category 1 label i and category 2 label 1, where i has a value between 1 to r. |
category_2_label_2 | INTEGER, SMALLINT, BYTEINT, or BIGINT | Joint frequency of category 1 label i and category 2 label 2, where i has a value between 1 to r. |
. . . . |
||
category_2_label_c | INTEGER, SMALLINT, BYTEINT, or BIGINT | [Column appears zero or more times.] Joint frequency of category 1 label i and category 2 label c, where i has a value between 1 to r. |