Teradata Package for R Function Reference | 17.00 - NTree - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

NTree

Description

The NTree function is a hierarchical analysis function that can build and traverse tree structures on all worker machines. The function reads the data only once from the disk and creates the trees in memory.

Usage

  td_ntree_mle (
      data = NULL,
      data.partition.column = "1",
      data.order.column = NULL,
      root.node = NULL,
      node.id = NULL,
      parent.id = NULL,
      allow.cycles = FALSE,
      starts.with = NULL,
      mode = NULL,
      output = NULL,
      max.distance = 5,
      logging = FALSE,
      result = NULL,
      data.sequence.column = NULL
  )

Arguments

`data`	Required Argument. Specifies the input tbl_teradata containing the input data.
`data.partition.column`	Optional Argument. Specifies Partition By columns for "data". Values to this argument can be provided as a vector, if multiple columns are used for partition. Default Value: "1" Types: character OR vector of Strings (character)
`data.order.column`	Optional Argument. Specifies Order By columns for "data". Values to this argument can be provided as a vector, if multiple columns are used for ordering. Types: character OR vector of Strings (character)
`root.node`	Required Argument. Specifies the logical SQL expression that defines the root nodes of the trees (for example, parent.id IS NULL). Types: character
`node.id`	Required Argument. Specifies the SQL expression whose value uniquely identifies a node in the input tbl_teradata. Note: A node can appear multiple times in the data set, with different parents. Types: character
`parent.id`	Required Argument. Specifies the SQL expression whose value identifies the parent node. Types: character
`allow.cycles`	Optional Argument. Specifies whether trees can contain cycles. If not, a cycle in the data set causes the function to throw an exception. Default Value: FALSE Types: logical
`starts.with`	Required Argument. Specifies the node from which to start tree traversal. Permitted Values: ROOT, LEAF, or a SQL expression that identifies a node. Types: character
`mode`	Required Argument. Specifies the direction of tree traversal from the start node, either up to the root node or down to the leaf nodes. Permitted Values: UP, DOWN Types: character
`output`	Required Argument. Specifies when to output a tuple, either at every node along the traversal path ("all") or only at the end of the traversal path ("end"). Default Value: END Permitted Values: END, ALL Types: character
`max.distance`	Optional Argument. Specifies the maximum tree depth. Default Value: 5 Types: integer
`logging`	Optional Argument. Specifies whether the function prints log messages. Default Value: FALSE Types: logical
`result`	Required Argument. Specifies aggregate operations to perform during tree traversal. The function reports the result of each aggregate operation in the output tbl_teradata. The syntax of aggregate is: operation (expression) [ ALIAS alias ], where operation is either PATH, SUM, LEVEL, MAX, MIN, IS_CYCLE, AVG, or PROPAGATE. expression is a SQL expression. If operation is LEVEL or IS_CYCLE, then expression must be *. alias is the name of the output tbl_teradata column that contains the result of the operation. The default value is the string "operation(expression)" without the quotation marks. For example, PATH(node_name). Note: The function ignores alias if it is the same as an input tbl_teradata column name. For the path from the "starts.with" node to the last traversed node, the operations do the following: PATH: Outputs the value of expression for each node, separating values with "->". SUM: Computes the value of expression for each node and outputs the sum of these values. LEVEL: Outputs the number of hops. MAX: Computes the value of expression for each node and outputs the highest of these values. MIN: Computes the value of expression for each node and outputs the lowest of these values. IS_CYCLE: Outputs the cycle (if any). AVG: Computes the value of expression for each node and outputs the average of these values. PROPAGATE: Evaluates expression with the value of the "starts.with" node and propagates the result to every node. Types: character
`data.sequence.column`	Optional Argument. Specifies the vector of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: character OR vector of Strings (character)

Value

Function returns an object of class "td_ntree_mle" which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator using the name: result.

Examples

  
    # Get the current context/connection.
    con <- td_get_context()$connection
    
    # Load example data
    loadExampleData("ntree_example", "employee_table", "emp_table_by_dept")

    # Create object(s) of class "tbl_teradata".
    employee_table <- tbl(con, "employee_table")
    emp_table_by_dept <- tbl(con, "emp_table_by_dept")

    # Example 1 - This example finds the employees who report to employee
    # 100 (either directly or indirectly) by traversing the tree
    # of employees from employee 100 downward.
    td_ntree_out1 <- td_ntree_mle(data=employee_table,
                                  root.node = 'mgr_id is NULL',
                                  node.id='emp_id',
                                  parent.id='mgr_id',
                                  starts.with='emp_id=100',
                                  mode='down',
                                  output='end',
                                  result='PATH(emp_name) AS path'
                                 )

    # Example 2 - This example finds the reporting structure by department.
    td_ntree_out2 <- td_ntree_mle(data=emp_table_by_dept,
                                  data.partition.column='department',
                                  root.node = "mgr_id = 'none'",
                                  node.id='id',
                                  parent.id='mgr_id',
                                  starts.with='root',
                                  mode='down',
                                  output='all',
                                  result='PATH(name) AS path, PATH(id) as path2'
                                 )