Teradata Package for R Function Reference | 17.20 - NPath - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
Language
English (United States)
Last Update
2024-05-03
dita:id
TeradataR_FxRef_Enterprise_1720
Product Category
Teradata Vantage

NPath

Description

The td_npath_sqle() function scans a set of rows, looking for patterns that you specify. For each set of input rows that matches the pattern, td_npath_sqle() produces a single output row. The function provides a flexible pattern-matching capability that lets you specify complex patterns in the input data and define the values that are output for each matched input set.

Usage

  td_npath_sqle (
      data1 = NULL,
      mode = NULL,
      pattern = NULL,
      symbols = NULL,
      result = NULL,
      filter = NULL,
      data2 = NULL,
      data3 = NULL,
      data4 = NULL,
      data5 = NULL,
      data6 = NULL,
      data7 = NULL,
      data8 = NULL,
      data9 = NULL,
      data10 = NULL,
      ...
  )

Arguments

data1

Required Argument.
Specifies the input tbl_teradata containing the input data set.
Types: tbl_teradata

mode

Required Argument.
Specifies the pattern-matching mode:

  • OVERLAPPING: The function finds every occurrence of the pattern in the partition, regardless of whether it is part of a previously found match. Therefore, one row can match multiple symbols in a given matched pattern.

  • NONOVERLAPPING: The function begins the next pattern search at the row that follows the last pattern match. This is the default behavior of many commonly used pattern matching utilities, including the UNIX grep utility.

Permitted Values: "OVERLAPPING", "NONOVERLAPPING"
Types: character

pattern

Required Argument.
Specifies the pattern for which the function searches. You compose pattern with the symbols that you define in the symbols argument, operators, and parentheses.
When patterns have multiple operators, the function applies them in order of precedence, and applies operators of equal precedence from left to right. To specify that a subpattern must appear a specific number of times, use the Range-Matching Feature.
The basic pattern operators in decreasing order of precedence "pattern", "pattern.", "pattern?", "pattern*", "pattern+", "pattern1.pattern2", "pattern1|pattern2", "^pattern", "pattern$" To force the function to evaluate a subpattern first, enclose it in parentheses.
Example:
^A.(B|C)+.D?.X*.A$ The preceding pattern definition matches any set of rows whose first row starts with the definition of symbol A, followed by a non-empty sequence of rows, each of which meets the definition of either symbol B or C, optionally followed by one row that meets the definition of symbol D, followed by any number of rows that meet the definition of symbol X, and ending with a row that ends with the definition of symbol A.
You can use parentheses to define precedence rules. Parentheses are recommended for clarity, even where not strictly required.
Types: character

symbols

Required Argument.
Specifies the symbols that appear in the values of the pattern and result arguments. The col_expr is an expression whose value is a column name, symbol is any valid identifier, and symbol_predicate is a SQL predicate (often a column name).
For example, the 'symbols' argument for analyzing website visits might look like this:
Symbols ( pagetype = "homepage" AS H, pagetype <> "homepage" AND pagetype <> "checkout" AS PP, pagetype = "checkout" AS CO ) The symbol is case-insensitive; however, a symbol of one or two uppercase letters is easy to identify in patterns.
If col_expr represents a column that appears in multiple input tbl_teradata objects, then you must qualify the ambiguous column name with the SQL name corresponding to it's tbl_teradata name.
For example:
Symbols ( input1.pagetype = "homepage" AS H, input1.pagetype = "thankyou" AS T, input2.adname = "xmaspromo" AS X, input2.adname = "realtorpromo" AS R ) The mapping from tbl_teradata name to its corresponding SQL name is as shown below:

  • data1: input1

  • data2: input2

  • data3: input3

You can create symbol predicates that compare a row to a previous or subsequent row, using a LAG or LEAD operator.
LAG Expression Syntax:
current_expr operator LAG (previous_expr, lag_rows [, default]) | LAG (previous_expr, lag_rows [, default]) operator current_expr LAG and LEAD Expression Rules:
• A symbol definition can have multiple LAG and LEAD expressions.
• A symbol definition that has a LAG or LEAD expression cannot have an OR operator.
• If a symbol definition has a LAG or LEAD expression and the input is not a table, you must create an alias of the input query.
Types: character OR vector of Strings (character)

result

Required Argument.
Specifies the output columns. The col_expr is an expression whose value
is a column name; it specifies the values to retrieve from the matched rows. The function applies aggregate function to these values.
Supported aggregate functions:
• SQL aggregate functions are [AVG, COUNT, MAX, MIN, SUM].
• ML Engine nPath sequence aggregate functions.
The function evaluates this argument once for every matched pattern in the partition (that is, it outputs one row for each pattern match).
Note:
For col_expr representing a column that appears in multiple input tbl_teradata objects, you must qualify the ambiguous column name with the SQL name corresponding to it's tbl_teradata name. Please see the description of the 'symbols' parameter for the mapping from tbl_teradata name to the SQL name.
Types: character OR vector of Strings (character)

filter

Optional Argument.
Specifies filters to impose on the matched rows. The function combines the filter expressions using the AND operator.
The filter_expression syntax is:
symbol_expression comparison_operator symbol_expression The two symbol expressions must be type-compatible.
The symbol_expression syntax is:
FIRST | LAST (column_with_expression OF [ANY](symbol[,...])) The column_with_expression cannot contain the operator AND or OR, and all its columns must come from the same input. If the function has multiple inputs, then column_with_expression and symbol must come from the same input.
The comparison_operator is either <, >, <=, >=, =, or <>.
Note:
For column_with_expression representing a column that appears in multiple input tbl_teradata objects, you must qualify the ambiguous column name with the SQL name corresponding to it's tbl_teradata name. Please see the description of the 'symbols' parameter for the mapping from tbl_teradata name to the SQL name.
Types: character OR vector of Strings (character)

data2

Optional Arguments.
Specifies the additional optional input tbl_teradatas containing the input data.
Types: tbl_teradata

data3

Optional Arguments.
Specifies the additional optional input tbl_teradatas containing the input data.
Types: tbl_teradata

...

Specifies the generic keyword arguments SQLE functions accept.
Below are the generic keyword arguments:

persist:
Optional Argument.
Specifies whether to persist the results of the function in a table or not.
When set to TRUE, results are persisted in a table; otherwise, results are garbage collected at the end of the session.
Default Value: FALSE
Types: logical

volatile:
Optional Argument.
Specifies whether to put the results of the function in a volatile table or not.
When set to TRUE, results are stored in a volatile table, otherwise not.
Default Value: FALSE
Types: logical

Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:

  • "<input.data.arg.name>.partition.column" accepts character OR vector of Strings (character) (Strings)

  • "<input.data.arg.name>.hash.column" accepts character OR vector of Strings (character) (Strings)

  • "<input.data.arg.name>.order.column" accepts character OR vector of Strings (character) (Strings)

  • "local.order.<input.data.arg.name>" accepts logical

Note:
These generic arguments are supported by tdplyr if the underlying SQL Engine function supports, else an exception is raised.

Value

Function returns an object of class "td_npath_sqle" which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator using the name(s):result

Examples

  
    
    # Get the current context/connection.
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("npath_example2","impressions","clicks2", "tv_spots",
                    "clickstream")

    # Create input tbl_teradata objects.
    impressions <- tbl(con, "impressions")
    clicks2 <- tbl(con, "clicks2")
    tv_spots <- tbl(con, "tv_spots")
    clickstream <- tbl(con, "clickstream")

    # Check the list of available analytic functions.
    display_analytic_functions()

    # Example 1: Search for pattern '(imp|tv_imp)*.click' in the provided
    #           data(impressions, clicks2, tv_spots).

    # Run td_npath_sqle function with the required patterns to get the rows 
    # which have specified pattern. Rows that matches the pattern.
    obj <- td_npath_sqle(data1=impressions,
                         data1.partition.column='userid',
                         data1.order.column='ts',
                         data2=clicks2,
                         data2.partition.column='userid',
                         data2.order.column='ts',
                         data3=tv_spots,
                         data3.partition.column='ts',
                         data3.order.column='ts',
                         result=c('COUNT(* of imp) as imp_cnt',
                                  'COUNT(* of tv_imp) as tv_imp_cnt'),
                         mode='nonoverlapping',
                         pattern='(imp|tv_imp)*.click',
                         symbols=c('true as imp','true as click',
                                   'true as tv_imp'))

    # Print the result.
    print(obj$result)

    # Example 2: Search for pattern 'home.clickview*.checkout' in the provided
    #           data set clickstream.

    # Run td_npath_sqle function with the required patterns to get the rows which
    # has specified pattern and filter the rows with the filter,
    # where filter and result have ML Engine nPath sequence aggregate functions
    # like 'FIRST', 'COUNT' and 'LAST'.
    obj <- td_npath_sqle(data1=clickstream,
                         data1.partition.column='userid',
                         data1.order.column='clicktime',
                         result=c('FIRST(userid of ANY(home, checkout,
                                                       clickview)) AS userid',
                                  'FIRST (sessionid of ANY(home, checkout,
                                   clickview)) AS sessioinid',
                                  'COUNT (* of any(home, checkout,
                                   clickview)) AS cnt1',
                                  'FIRST (clicktime of ANY(home)) AS firsthome',
                                  'LAST (clicktime of ANY(checkout)) 
                                   AS lastcheckout'),
                         mode='nonoverlapping',
                         pattern='home.clickview*.checkout',
                         symbols=c("cnt='home' AS home",
                                   "cnt <> 'home' AND cnt <> 'checkout'
                                    AS clickview",
                                   "cnt='checkout' AS checkout"),
                         filter = "FIRST (clicktime OF ANY (home))
                                  < FIRST (clicktime of any(checkout))")

    # Print the result.
    print(obj$result)