Teradata R Package Function Reference - NTree - Teradata R Package - Look here for syntax, methods and examples for the functions included in the Teradata R Package.

Teradata® R Package Function Reference

Product
Teradata R Package
Release Number
16.20
Published
February 2020
Language
English (United States)
Last Update
2020-02-28
dita:id
B700-4007
lifecycle
previous
Product Category
Teradata Vantage

Description

The NTree (td_ntree_mle) function is a hierarchical analysis SQL function that can build and traverse tree structures on all worker machines. The function reads the data only once from the disk and creates the trees in memory.

Usage

  td_ntree_mle (
      data = NULL,
      data.partition.column = "1",
      data.order.column = NULL,
      root.node = NULL,
      node.id = NULL,
      parent.id = NULL,
      allow.cycles = FALSE,
      starts.with = NULL,
      mode = NULL,
      output = NULL,
      max.distance = 5,
      logging = FALSE,
      result = NULL,
      data.sequence.column = NULL
  )

Arguments

data

Required Argument.
Specifies the input tbl_teradata that contains the input table.

data.partition.column

Optional Argument.
Partition By columns for 'data'. Values to this argument can be provided as vector, if multiple columns are used for partition. Default Value: 1 Types: character OR vector of Strings (character)

data.order.column

Optional Argument.
Order By columns for 'data'. Values to this argument can be provided as vector, if multiple columns are used for ordering. Types: character OR vector of Strings (character)

root.node

Required Argument.
Specifies the logical SQL expression that defines the root nodes of the trees (for example, parent.id IS NULL).

node.id

Required Argument.
Specifies the SQL expression whose value uniquely identifies a node in the input tbl_teradata (for example, order_id).
Note: A node can appear multiple times in the data set, with different parents.

parent.id

Required Argument.
Specifies the SQL expression whose value identifies the parent node.

allow.cycles

Optional Argument.
Specifies whether trees can contain cycles. If not, a cycle in the data set causes the function to throw an exception.
Default Value: FALSE

starts.with

Required Argument.
Specifies the node from which to start tree traversal.
Permitted Values: ROOT, LEAF, or a SQL expression that identifies a node.

mode

Required Argument.
Specifies the direction of tree traversal from the start node, either up to the root node or down to the leaf nodes.
Permitted Values: UP, DOWN

output

Required Argument.
Specifies when to output a tuple either at every node along the traversal path ("all") or only at the end of the traversal path ("end").
Default Value: END
Permitted Values: END, ALL

max.distance

Optional Argument.
Specifies the maximum tree depth.
Default Value: 5

logging

Optional Argument.
Specifies whether the function prints log messages.
Default Value: FALSE

result

Required Argument. Specifies aggregate operations to perform during tree traversal. The function reports the result of each aggregate operation in the output table. The syntax of aggregate is:
operation (expression) [ ALIAS alias ], where

  1. operation is either PATH, SUM, LEVEL, MAX, MIN, IS_CYCLE, AVG, or PROPAGATE.

  2. expression is a SQL expression. If operation is LEVEL or IS_CYCLE, then expression must be *.

  3. alias is the name of the output tbl_teradata column that contains the result of the operation.

The default value is the string "operation(expression)" without the quotation marks. For example, PATH(node_name).
Note: The function ignores alias if it is the same as an input tbl_teradata column name. For the path from the "starts.with" node to the last traversed node, the operations do the following:

  1. PATH: Outputs the value of expression for each node, separating values with "->".

  2. SUM: Computes the value of expression for each node and outputs the sum of these values.

  3. LEVEL: Outputs the number of hops.

  4. MAX: Computes the value of expression for each node and outputs the highest of these values.

  5. MIN: Computes the value of expression for each node and outputs the lowest of these values.

  6. IS_CYCLE: Outputs the cycle (if any).

  7. AVG: Computes the value of expression for each node and outputs the average of these values.

  8. PROPAGATE: Evaluates expression with the value of the 'starts.with' node and propagates the result to every node.

data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.

Value

Function returns an object of class "td_ntree_mle" which is a named list containing Teradata tbl object.
Named list member can be referenced directly with the "$" operator using name: result.

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data
    loadExampleData("ntree_example", "employee_table", "emp_table_by_dept")
    
    # Create remote tibble objects
    employee_table <- tbl(con, "employee_table")
    emp_table_by_dept <- tbl(con, "emp_table_by_dept")
    
    # Example 1 - This example finds the employees who report to employee
    # 100 (either directly or indirectly) by traversing the tree 
    # of employees from employee 100 downward.
    td_ntree_out1 <- td_ntree_mle(data=employee_table,
                              root.node = 'mgr_id is NULL',
                              node.id='emp_id',
                              parent.id='mgr_id',
                              starts.with='emp_id=100',
                              mode='down',
                              output='end',
                              result='PATH(emp_name) AS path'
                             )
    
    # Example 2 - This example finds the reporting structure by department.
    td_ntree_out2 <- td_ntree_mle(data=emp_table_by_dept,
                              data.partition.column='department',
                              root.node = "mgr_id = 'none'",
                              node.id='id',
                              parent.id='mgr_id',
                              starts.with='root',
                              mode='down',
                              output='all',
                              result='PATH(name) AS path, PATH(id) as path2'
                             )