The FPGrowth function outputs a message and either a pattern table, a rule table, or both (depending on the PatternsOrRules syntax element).
Output Message Schema
Column | Data Type | Description |
---|---|---|
output_information | VARCHAR | Reports that patterns and rules are kept in tables specified in OutputPatternsTable and OutputRulesTable syntax elements. |
OutputPatternsTable Schema
Column | Data Type | Description |
---|---|---|
group_by_column | Same as in InputTable | [Column appears once for each specified group_by_column.] Column copied from InputTable. |
pattern_target_column | VARCHAR | Pattern composed of transaction items. |
length_of_pattern | INTEGER | Number of items in pattern. |
count | BIGINT | Count of occurrence of pattern. |
support | DOUBLE PRECISION | Percentage of transactions that contain the pattern count/t, where t is number of transactions. For example, if eggs and milk were purchased together 3 times in 5 transactions, the support value is 3/5, 60%. |
OutputRulesTable Schema
Column | Data Type | Contents |
---|---|---|
antecedent_target_column | VARCHAR | Items in the antecedent of the rule. |
consequence_target_column | VARCHAR | Items in the consequence of the rule. |
count_of_antecedent | INTEGER | Count of items in the antecedent of the rule. |
count_of_consequence | INTEGER | Count of items in the consequence of the rule. |
cntb | BIGINT | Count of transactions that contain both the antecedent and consequence. |
cnt_antecedent | BIGINT | Count of transactions that contain the antecedent. |
cnt_consequence | BIGINT | Count of transactions that contain the consequence. |
score | DOUBLE PRECISION | Product of two conditional probabilities: (cntb / cnt_antecedent) * (cntb / cnt_consequence) |
support | DOUBLE PRECISION | Percentage of transactions that contain both the antecedent and consequence: cntb/t, where t is the number of transactions. For example, if eggs and milk were purchased together 3 times in 5 transactions, then the support value is 3/5, 60%. |
confidence | DOUBLE PRECISION | Percentage of transactions that contain the antecedent that also contain the consequence: cntb / cnt_antecedent For example, for the antecedent milk and consequence butter, if cntb=3 and cnt_antecedent=4, then the confidence value is 3/4, 75%. In other words, 75% of the time, when a person buys milk, the person also buys butter. |
lift | DOUBLE PRECISION | Ratio of the observed support value to the expected support value if the antecedent and consequence are independent: (cntb/t) / ((cnt_antecedent/t) * (cnt_consequence/t)) |
conviction | DOUBLE PRECISION | More reliable alternative to confidence: (1-cnt_consequence/t) / (1-cntb/cnt_antecedent) |
leverage | DOUBLE PRECISION | Difference between the percentage of transactions that contain both the antecedent and consequence (cntb/t) and the expectation for cntb/t if the antecedent and consequence were statistically independent: (cntb/t) - ((cnt_antecedent/t) * (cnt_consequence/t)) |
coverage | DOUBLE PRECISION | Percentage of transactions in which the rule applies: cnt_antecedent/t Another name for coverage is antecedent support. |
chi_square | DOUBLE PRECISION | Chi-squared test result, used to test the hypothesis that the antecedent and consequence are not associated. The formula follows this table. |
z_score | DOUBLE PRECISION | Significance of cntb, assuming that it follows a normal distribution: (cntb - mean(cntb)) / standard_deviation(cntb) If every cntb is the same, then standard_deviation(cntb) is 0, and the function does not compute z_score. |
Formula for chi_square Value
(t * (cntb * (t + cntb - cnt_antecedent - cnt_consequence) - (cnt_antecedent - cntb) *
(cnt_consequence - cntb))**2) /
(cnt_antecedent * (t - cnt_antecedent) * cnt_consequence * (t - cnt_consequence))