Teradata R Package Function Reference - PathAnalyzer - Teradata R Package - Look here for syntax, methods and examples for the functions included in the Teradata R Package.

Teradata® R Package Function Reference

Product
Teradata R Package
Release Number
16.20
Published
February 2020
Language
English (United States)
Last Update
2020-02-28
dita:id
B700-4007
lifecycle
previous
Product Category
Teradata Vantage

Description

The PathAnalyzer function (td_path_analyzer_mle):

  1. Inputs a set of paths to the PathGenerator function (td_path_generator_mle).

  2. Inputs the PathGenerator output to the PathSummarizer function (td_path_summarizer_mle).

  3. Inputs the PathSummarizer output to the PathStart function (td_path_start_mle), which outputs, for each parent, all children and the number of times that the user traveled each child.

Usage

  td_path_analyzer_mle (
      data = NULL,
      seq.column = NULL,
      count.column = NULL,
      hash = FALSE,
      delimiter = ",",
      data.sequence.column = NULL
  )

Arguments

data

Required Argument.
Specifies either the input tbl_teradata that contains the paths to analyze, or the output of the td_npath_sqle function.
Each path is a string of alphanumeric symbols that represents an ordered sequence of page views (or actions). Typically each symbol is a code that represents a unique page view.

seq.column

Required Argument.
Specifies the name of the input tbl_teradata column that contains the paths.
Types: character

count.column

Optional Argument.
Specifies the name of the input tbl_teradata column that contains the number of times a path was traveled.
Note: If this argument is not specified, a column "cnt" is auto-generated with the number of unique paths in the "seq.column" column.
Types: character

hash

Optional Argument.
Specifies whether to include the hash code of the output column node.
Default Value: FALSE
Types: logical

delimiter

Optional Argument. Specifies the single-character delimiter that separates symbols in the path string.
Note: Do not use any of the following characters as delimiter (they cause the function to fail): Asterisk (*), Plus (+), Left parenthesis ((), Right parenthesis ()), Single quotation mark ('), Escaped single quotation mark (\'), Backslash (\).
Default Value: ","
Types: character

data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

Value

Function returns an object of class "td_path_analyzer_mle" which is a named list containing Teradata tbl objects.
Named list members can be referenced directly with the "$" operator using following names:

  1. output.table

  2. output

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("pathgenerator_example", "clickstream1")
    
    # Create remote tibble objects.
    clickstream1 <- tbl(con, "clickstream1")
    
    # Example 1 - Use the click stream data to run path analysis functions
    td_path_analyzer_mle_out1 <- td_path_analyzer_mle(data = clickstream1,
                                                     seq.column = "path",
                                                     count.column = "cnt"
                                                     )
    # Example 2 - Exclude count.column argument, and the function generates the count column
    td_path_analyzer_mle_out2 <- td_path_analyzer_mle(data = clickstream1,
                                                     seq.column = "path"
                                                     )

    # Example 3 - Use the output of NPath (td_npath_sqle) function as an input
    loadExampleData("npath_example1", "bank_web_clicks2")
    
    # Create remote tibble objects.
    bank_web_clicks2 <- tbl(con, "bank_web_clicks2")
    
    # Execute npath function.
    td_npath_out <- td_npath_sqle(
                       data1=bank_web_clicks2,
                       data1.partition.column = c("customer_id", "session_id"),
                       data1.order.column = "datestamp",
                       mode = "nonoverlapping",
                       pattern = "A*",
                       symbols = c("true AS A"),
                       result = c("ACCUMULATE (page OF A) AS page_path")
                       );

    # This takes the td_npath_out object as input and the count column gets auto-generated
    td_path_analyzer_mle_out3 <- td_path_analyzer_mle(data = td_npath_out,
                                                     seq.column = "page_path"
                                                     )