In a sequential pattern mining application, each sequence is an ordered list of item sets, and each item set contains at least one item. Items within a set are unordered.
In web clickstream analysis, each set has only one item. In purchase behavior analysis, a set has an item for each item that the customer buys in one shopping session.
In sequential pattern mining, sequence α is a subsequence of sequence β if both of the following are true:
- Each item set ai in α is a subset of an item set bj in β.
- The ai elements in α have the same order as the bj elements in β.
More formally: sequence α=<1a2...an> is a subsequence of sequence β=<b1b2...bm>, and β is a super sequence of α, if there exist integers 1≤j1<j2<...≤jn≤m such that a1⊆bj1,a2⊆bj2,...,an⊆bj n.
The support of sequence α in a sequence data set SDB is defined as the number of sequences in SDB that contain α (that is, the number of sequences in SDB that are super sequences of α).
Given sequence data set SDB and threshold T, sequence α is called as a frequent sequential pattern of SDB, if support(α)≥T. The problem of sequential pattern mining is to find all possible frequent sequential patterns, given a sequence data set SDB and a threshold T.