The FPGrowth function outputs either a pattern table, a rule table, or both (depending on the value of the PatternsOrRules argument).
The following table describes its columns of the pattern table.
Column Name | Data Type | Description |
---|---|---|
group_by_column | Same as input | Column copied from input table |
pattern_item_column | VARCHAR | Pattern composed of transaction items |
length_of_pattern | INTEGER | Number of items in pattern |
count | BIGINT | Count of occurrence of pattern |
support | DOUBLE PRECISION | Percentage of transactions that contain the pattern: count/t, where t is the number of transactions. For example, if eggs and milk were purchased together 3 times in 5 transactions, then the support value is 3/5, 60%. |
The output has one row for each rule. The following table describes its columns.
Column Name | Data Type | Contents |
---|---|---|
antecedent_item_column | VARCHAR | Items in the antecedent of the rule. |
consequence_item_column | VARCHAR | Items in the consequence of the rule. |
count_of_antecedent | INTEGER | Count of items in the antecedent of the rule. |
count_of_consequence | INTEGER | Count of items in the consequence of the rule. |
cntb | BIGINT | Count of transactions that contain both the antecedent and consequence. |
cnt_antecedent | BIGINT | Count of transactions that contain the antecedent. |
cnt_consequence | BIGINT | Count of transactions that contain the consequence. |
score | DOUBLE PRECISION | Product of two conditional probabilities: (cntb / cnt_antecedent) * (cntb / cnt_consequence) |
support | DOUBLE PRECISION | Percentage of transactions that contain both the antecedent and consequence: cntb/t, where t is the number of transactions. For example, if eggs and milk were purchased together 3 times in 5 transactions, then the support value is 3/5, 60%. |
confidence | DOUBLE PRECISION | Percentage of transactions that contain the antecedent that also contain the consequence: cntb / cnt_antecedent For example, for the antecedent milk and consequence butter, if cntb=3 and cnt_antecedent=4, then the confidence value is 3/4, 75%. In other words, 75% of the time, when a person buys milk, the person also buys butter. |
lift | DOUBLE PRECISION | Ratio of the observed support value to the expected support value if the antecedent and consequence are independent: (cntb/t) / ((cnt_antecedent/t) * (cnt_consequence/t)) |
conviction | DOUBLE PRECISION | More reliable alternative to confidence: (1-cnt_consequence/t) / (1-cntb/cnt_antecedent) |
leverage | DOUBLE PRECISION | Difference between the percentage of transactions that contain both the antecedent and consequence (cntb/t) and the expectation for cntb/t if the antecedent and consequence were statistically independent: (cntb/t) - ((cnt_antecedent/t) * (cnt_consequence/t)) |
coverage | DOUBLE PRECISION | Percentage of transactions in which the rule applies: cnt_antecedent/t Another name for coverage is antecedent support. |
chi_square | DOUBLE PRECISION | Chi-squared test result, used to test the hypothesis that the antecedent and consequence are not associated. The formula follows this table. |
z_score | DOUBLE PRECISION | Significance of cntb, assuming that it follows a normal distribution: (cntb - mean(cntb)) / standard_deviation(cntb) If every cntb is the same, then standard_deviation(cntb) is 0, and the function does not compute z_score. |
The formula for the value of chi_square is:
(t * (cntb * (t + cntb - cnt_antecedent - cnt_consequence) - (cnt_antecedent - cntb) *
(cnt_consequence - cntb))**2) /
(cnt_antecedent * (t - cnt_antecedent) * cnt_consequence * (t - cnt_consequence))