FrequentPaths Syntax Elements - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
9.02
9.01
2.0
1.3
Published
February 2022
Language
English (United States)
Last Update
2022-02-10
dita:mapPath
rnn1580259159235.ditamap
dita:ditavalPath
ybt1582220416951.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantage™
OutputTable
Specify the name of the table where the function outputs the subsequences.
SeqPatternTable
[Optional] Specify the name of the table where the function outputs sequence-pattern pairs. For example, if a sequence has a partition value of "1" and contains 3 patterns with IDs 2, 9, and 10, for that sequence the function outputs the sequence-pattern pairs ("1", 2), ("1", 9), and ("1", 10).
PartitionColumns
Specify the names of the columns that comprise the partition key of the InputTable sequences.
TimeColumn
[Required when ItemColumn or ItemDefinition is specified.] Specify the name of the InputTable column that determines the order of items in a sequence. Items in the same sequence that have the same time stamp belong to the same set.
PathFilters
[Optional] Specify the filters to use on the InputTable sequences. Only InputTable sequences that satisfy all constraints of at least one filter are input to the function.
Each filter has one or more constraints, which are separated by spaces. Each constraint has this syntax:
constraint (item [symbol ...])
constraint Description
STW (start-with_constraint) First item set of sequence must contain at least one item.

For example, STW(c,d) requires first item set of sequence to contain c or d. Sequence "(a, c), e, (f, d)" meets this constraint because first item set, (a,c), contains c.

EDW (end-with_constraint) Last item set of sequence must contain at least one item.

For example, EDW(f,g) requires the last item set of the sequence to contain f or g. Sequence "(a, b), e, (f, d)" meets this constraint because last item set, (f,d), contains f.

CTN (containing_constraint) Sequence must contain at least one item.

For example, CTN(a,b) requires sequence to contain a or b. Sequence "(a,c), d, (e,f)" meets this constraint but sequence "d, (e,f)" does not.

Constraints in the same filter must differ. For example:
  • Valid: 'STW(c,d) EDW(g,k) CTN(e)'
  • Invalid: 'STW(c,d) STW(e,h)'
This syntax element specifies a separator and uses it in two filters:
PathFilters('Separator(#)', 'STW(c#d) EDW(g#k) CTN(e)', 'CTN(h#k)')
If you specify symbol, it applies to all filters. Default symbol: comma (,)
GroupByColumns
[Optional] Specify the names of the InputTable columns by which to group the InputTable sequences. If you specify this syntax element, the function operates on each group separately and copies each group_column to the output table.
ItemColumn
[Required if you specify neither ItemDefinitionColumns nor PathColumn.] Specify the names of the InputTable columns that contain the items.
PathColumn
[Required if you specify neither ItemDefinitionColumns nor ItemColumn.] Specify the name of the InputTable column that contains paths in the form of sequence strings. A sequence string has this syntax:
'[item [, ...]]'

In the sequence string syntax, you must type the outer brackets (bold). The sequence strings in this column can be created by the nPath function.

If you specify this syntax element, each item set can have only one item.

ItemDefinitionColumns
[Required if you specify neither ItemColumn nor PathColumn.] Specify the names of the index, definition, and item columns of the ItemDefinitionTable.
MinSupport
Determines the threshold for whether a sequential pattern is frequent. The minimum must be a positive real number.

If minimum is in the range (0,1], it is a relative threshold: If N is the total number of input sequences, the threshold is T=N*minimum. For example, if there are 1000 sequences in the InputTable and minimum is 0.05, the threshold is 50.

If minimum is in the range (1,+), it is an absolute threshold: Regardless of N, T=minimum. For example, if minimum is 50, the threshold is 50, regardless of N.

A pattern is frequent if its support value is at least T.

Because the function outputs only frequent patterns, minimum controls the number of output patterns. If minimum is small, processing time increases exponentially; therefore, Teradata recommends starting the trial with a larger value.—for example, 5% of the total sequence number if you know N and 0.05 otherwise.

If you specify a relative minimum and GroupByColumns, the function calculates N and T for each group.

If you specify a relative minimum and PathFilters, N is the number of sequences that meet the constraints of the filters.

MaxLength
[Optional] Specify the maximum length of the output sequential patterns. The length of a pattern is its number of sets.
Default: Maximum INTEGER value
MinLength
[Optional] Specify the minimum length of the output sequential patterns.
Default: 1
ClosedPattern
[Optional] Specify whether to output only closed patterns.
Default: 'false'