FPGrowth Background - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.10
1.1
Published
October 2019
Language
English (United States)
Last Update
2019-12-31
dita:mapPath
ima1540829771750.ditamap
dita:ditavalPath
jsj1481748799576.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

The purpose of association rule mining is to identify strong rules in databases, using different measures of interestingness, and then discover regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets. For example, the association rule {onions, potatoes}=>{hamburger} indicates that if a customer buys onions and potatoes together, he or she is likely to buy hamburger. Unlike sequence mining, association rule mining typically does not consider the order of items, either within a transaction or across transactions.

Association rules are used in making marketing decisions, such as when to offer promotional pricing or where to place products. They are also useful in many other areas, such as Web usage mining, intrusion detection, continuous production, and bioinformatics. However, association rule mining in large databases is computationally expensive, especially when there is a large number of patterns.

An efficient, scalable method for mining the complete set of frequent patterns from a data set is the frequent-pattern growth (FP-growth) algorithm. The FP-growth algorithm stores crucial information about frequent pattern growth in a frequent-pattern tree (FP-tree), an extended prefix-tree structure of compressed information, and then uses a divide-and-conquer strategy. Because the FP-tree retains the item set association information, the FP-growth algorithm does not do candidate generations, which are computationally expensive.

The FPGrowth function does the following:
  1. Divides a large data set into independent partitions.
  2. Compresses the data in each partition, creating an FP-tree to represent association information between items.
  3. Divides the FP-tree into conditional databases, each associated with one frequent pattern.
  4. Mines each conditional database for patterns.
  5. Creates association rules from the patterns.

    To create an association rule from a pattern, the function takes one or more items in the pattern as the consequence and the remaining items as the antecedent. In the rule {onions, potatoes}=>{hamburger}, {onions, potatoes} is the antecedent and {hamburger} is the consequence.

  6. Determines the interestingness of the association rules.

The FPGrowth function automatically truncates long transactions by removing low-frequency items, guaranteeing that a single transaction creates at most 1 million patterns. Automatic truncation depends only on the value of the MaxPatternLength syntax element.