FrequentPaths Arguments - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantage™
OutputTable
Specify the name of the table where the function outputs the subsequences.
SeqPatternTable
[Optional] Specify the name of the table where the function outputs sequence-pattern pairs. For example, if a sequence has a partition value of "1" and contains 3 patterns with IDs 2, 9, and 10, for that sequence the function outputs the sequence-pattern pairs ("1", 2), ("1", 9), and ("1", 10).
PartitionColumns
Specify the names of the columns that comprise the partition key of the input sequences.
TimeColumn
[Required when ItemColumn or ItemDefinition is specified.] Specify the name of the input table column that determines the order of items in a sequence. Items in the same sequence that have the same time stamp belong to the same set.
PathFilters
[Optional] Specify the filters to use on the input table sequences. Only input table sequences that satisfy all constraints of at least one filter are input to the function.
Each filter has one or more constraints, which are separated by spaces. Each constraint has this syntax:
constraint (item [symbol ...])
constraint Description
STW (start-with_constraint) First item set of sequence must contain at least one item.

For example, STW(c,d) requires first item set of sequence to contain c or d. Sequence "(a, c), e, (f, d)" meets this constraint because first item set, (a,c), contains c.

EDW (end-with_constraint) Last item set of sequence must contain at least one item.

For example, EDW(f,g) requires the last item set of the sequence to contain f or g. Sequence "(a, b), e, (f, d)" meets this constraint because last item set, (f,d), contains f.

CTN (containing_constraint) Sequence must contain at least one item.

For example, CTN(a,b) requires sequence to contain a or b. Sequence "(a,c), d, (e,f)" meets this constraint but sequence "d, (e,f)" does not.

Constraints in the same filter must differ. For example:
  • Valid: 'STW(c,d) EDW(g,k) CTN(e)'
  • Invalid: 'STW(c,d) STW(e,h)'
This argument specifies a separator and uses it in two filters:
PathFilters('Separator(#)', 'STW(c#d) EDW(g#k) CTN(e)', 'CTN(h#k)')
If you specify symbol, it applies to all filters. Default symbol: comma (,)
GroupByColumns
[Optional] Specify the names of the input table columns by which to group the input table sequences. If you specify this argument, the function operates on each group separately and copies each group_column to the output table.
ItemColumn
[Required if you specify neither ItemDefinitionColumns nor PathColumn.] Specify the names of the input table columns that contain the items.
PathColumn
[Required if you specify neither ItemDefinitionColumns nor ItemColumn.] Specify the name of the input table column that contains paths in the form of sequence strings. A sequence string has this syntax:
'[item [, ...]]'

In the sequence string syntax, you must type the outer brackets (bold). The sequence strings in this column can be created by the nPath function.

If you specify this argument, each item set can have only one item.

ItemDefinitionColumns
[Required if you specify neither ItemColumn nor PathColumn.] Specify the names of the index, definition, and item columns of the item_definition_table.
MinSupport
Determines the threshold for whether a sequential pattern is frequent. The minimum must be a positive real number.

If minimum is in the range (0,1], it is a relative threshold: If N is the total number of input sequences, the threshold is T=N*minimum. For example, if there are 1000 sequences in the input table and minimum is 0.05, the threshold is 50.

If minimum is in the range (1,+), it is an absolute threshold: Regardless of N, T=minimum. For example, if minimum is 50, the threshold is 50, regardless of N.

A pattern is frequent if its support value is at least T.

Because the function outputs only frequent patterns, minimum controls the number of output patterns. If minimum is small, processing time increases exponentially; therefore, Teradata recommends starting the trial with a larger value.—for example, 5% of the total sequence number if you know N and 0.05 otherwise.

If you specify a relative minimum and GroupByColumns, the function calculates N and T for each group.

If you specify a relative minimum and PathFilters, N is the number of sequences that meet the constraints of the filters.

MaxLength
[Optional] Specify the maximum length of the output sequential patterns. The length of a pattern is its number of sets.
Default: No maximum length
MinLength
[Optional] Specify the minimum length of the output sequential patterns.
Default: 1
ClosedPattern
[Optional] Specify whether to output only closed patterns.
Default: 'false'