7.00.02 - FrequentPaths Arguments - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Release Date
September 2017
Content Type
Programming Reference
User Guide
Publication ID
B700-1022-700K
Language
English (United States)
InputTable
Specifies the name of the table that contains the input sequences. Each row is one item in a sequence. If input_table does not include a schema, the function searches for it in the user's search path. The function ignores rows that contain any NULL values.
OutputTable
Specifies the name of the table where the function outputs the subsequences.
PartitionColumns
Specifies the names of the columns that comprise the partition key of the input sequences.
TimeColumn
[Required when ItemColumn or ItemDefinition is specified.] Specifies the name of the input table column that determines the order of items in a sequence. Items in the same sequence that have the same time stamp belong to the same set.
PathFilters
[Optional] Specifies the filters to use on the input table sequences. Only input table sequences that satisfy all constraints of at least one filter are input to the function.

Each filter has one or more constraints, which are separated by spaces. Each constraint has this syntax:

constraint (item [symbol ...])

Default: symbol is comma (,). If you specify symbol, it applies to all filters. The constraint is one of the following:

  • STW (start-with_constraint)

    The first item set of the sequence must contain at least one item. For example, STW(c,d) requires the first item set of the sequence to contain c or d. Sequence "(a, c), e, (f, d)" meets this constraint because the first item set, (a,c), contains c.

  • EDW (end-with_constraint)

    The last item set of the sequence must contain at least one item. For example, EDW(f,g) requires the last item set of the sequence to contain f or g. Sequence "(a, b), e, (f, d)" meets this constraint because the last item set, (f,d), contains f.

  • CTN (containing_constraint)

    The sequence must contain at least one item. For example, CTN(a,b) requires the sequence to contain a or b. The sequence "(a,c), d, (e,f)" meets this constraint but the sequence "d, (e,f)" does not.

Constraints in the same filter must be different. For example:

  • Valid: 'STW(c,d) EDW(g,k) CTN(e)'
  • Invalid: 'STW(c,d) STW(e,h)'

This argument specifies a separator and uses it in two filters:

PathFilters('Separator(#)', 'STW(c#d) EDW(g#k) CTN(e)', 'CTN(h#k)')

GroupByColumns
[Optional] Specifies the names of the input table columns by which to group the input table sequences. If you specify this argument, the function operates on each group separately and copies each group_by_column to the output table.
SeqPatternTable
[Optional] Specifies the name of the table where the function outputs sequence-pattern pairs. For example, if a sequence has a partition value of "1" and contains 3 patterns with IDs 2, 9, and 10, for that sequence the function outputs the sequence-pattern pairs ("1", 2), ("1", 9), and ("1", 10).

If sequence_pattern_table does not include a schema, the function creates it in the first schema in the user's search path.

If the function finds no sequence-pattern pairs, it does not create sequence_pattern_table.

ItemColumn
[Required if you specify neither ItemDefinition nor PathColumn.] Specifies the names of the input table columns that contain the items.
ItemDefinition
[Required if you specify neither you specify neither ItemColumn nor PathColumn.] Specifies the name of the item definition table and the names of its index, definition, and item columns. If item_definition_table does not include a schema, the function searches for it in the schema in the user's search path.
PathColumn
[Required if you specify neither you specify neither ItemColumn nor ItemDefinition.] Specifies the name of the input table column that contains paths in the form of sequence strings. A sequence string has this syntax:
'[item [, ...]]'

In the sequence string syntax, you must type the outer brackets (bold). The sequence strings in this column can be generated by the nPath function.

If you specify this argument, each item set can have only one item.

MinSupport
Determines the threshold for whether a sequential pattern is frequent. The minimum must be a positive real number.

If minimum is in the range (0,1], it is a relative threshold: If N is the total number of input sequences, the threshold is T=N*minimum. For example, if there are 1000 sequences in the input table and minimum is 0.05, the threshold is 50.

If minimum is in the range (1,+), it is an absolute threshold: Regardless of N, T=minimum. For example, if minimum is 50, the threshold is 50, regardless of N.

A pattern is frequent if its support value is at least T.

Because the function outputs only frequent patterns, minimum controls the number of output patterns. If minimum is small, processing time increases exponentially; therefore, Teradata recommends starting the trial with a larger value.—for example, 5% of the total sequence number if you know N and 0.05 otherwise.

If you specify a relative minimum and GroupByColumns, the function calculates N and T for each group.

If you specify a relative minimum and PathFilters, N is the number of sequences that meet the constraints of the filters.

MaxLength
[Optional] Specifies the maximum length of the output sequential patterns. The length of a pattern is its number of sets. Default: No maximum length.
MinLength
[Optional] Specifies the minimum length of the output sequential patterns. Default: 1.
ClosedPattern
[Optional] Specifies whether to output only closed patterns. Default: 'false'.