Teradata R Package Function Reference | 17.00 - 17.00 - NTree - Teradata R Package

Teradata® R Package Function Reference

prodname
Teradata R Package
vrm_release
17.00
created_date
September 2020
category
Programming Reference
featnum
B700-4007-090K

Description

The NTree function is a hierarchical analysis function that can build and traverse tree structures on all worker machines. The function reads the data only once from the disk and creates the trees in memory.

Usage

  td_ntree_mle (
      data = NULL,
      data.partition.column = "1",
      data.order.column = NULL,
      root.node = NULL,
      node.id = NULL,
      parent.id = NULL,
      allow.cycles = FALSE,
      starts.with = NULL,
      mode = NULL,
      output = NULL,
      max.distance = 5,
      logging = FALSE,
      result = NULL,
      data.sequence.column = NULL
  )

Arguments

data

Required Argument.
Specifies the input tbl_teradata containing the input data.

data.partition.column

Optional Argument.
Specifies Partition By columns for "data".
Values to this argument can be provided as a vector, if multiple columns are used for partition.
Default Value: "1"
Types: character OR vector of Strings (character)

data.order.column

Optional Argument.
Specifies Order By columns for "data".
Values to this argument can be provided as a vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

root.node

Required Argument.
Specifies the logical SQL expression that defines the root nodes of the trees (for example, parent.id IS NULL).
Types: character

node.id

Required Argument.
Specifies the SQL expression whose value uniquely identifies a node in the input tbl_teradata.
Note: A node can appear multiple times in the data set, with different parents.
Types: character

parent.id

Required Argument.
Specifies the SQL expression whose value identifies the parent node.
Types: character

allow.cycles

Optional Argument.
Specifies whether trees can contain cycles. If not, a cycle in the data set causes the function to throw an exception.
Default Value: FALSE
Types: logical

starts.with

Required Argument.
Specifies the node from which to start tree traversal.
Permitted Values: ROOT, LEAF, or a SQL expression that identifies a node.
Types: character

mode

Required Argument.
Specifies the direction of tree traversal from the start node, either up to the root node or down to the leaf nodes.
Permitted Values: UP, DOWN
Types: character

output

Required Argument.
Specifies when to output a tuple, either at every node along the traversal path ("all") or only at the end of the traversal path ("end").
Default Value: END
Permitted Values: END, ALL
Types: character

max.distance

Optional Argument.
Specifies the maximum tree depth.
Default Value: 5
Types: integer

logging

Optional Argument.
Specifies whether the function prints log messages.
Default Value: FALSE
Types: logical

result

Required Argument.
Specifies aggregate operations to perform during tree traversal. The function reports the result of each aggregate operation in the output tbl_teradata. The syntax of aggregate is:
operation (expression) [ ALIAS alias ], where

  1. operation is either PATH, SUM, LEVEL, MAX, MIN, IS_CYCLE, AVG, or PROPAGATE.

  2. expression is a SQL expression. If operation is LEVEL or IS_CYCLE, then expression must be *.

  3. alias is the name of the output tbl_teradata column that contains the result of the operation.

The default value is the string "operation(expression)" without the quotation marks. For example, PATH(node_name).
Note: The function ignores alias if it is the same as an input tbl_teradata column name.
For the path from the "starts.with" node to the last traversed node, the operations do the following:

  1. PATH: Outputs the value of expression for each node, separating values with "->".

  2. SUM: Computes the value of expression for each node and outputs the sum of these values.

  3. LEVEL: Outputs the number of hops.

  4. MAX: Computes the value of expression for each node and outputs the highest of these values.

  5. MIN: Computes the value of expression for each node and outputs the lowest of these values.

  6. IS_CYCLE: Outputs the cycle (if any).

  7. AVG: Computes the value of expression for each node and outputs the average of these values.

  8. PROPAGATE: Evaluates expression with the value of the "starts.with" node and propagates the result to every node.

Types: character

data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

Value

Function returns an object of class "td_ntree_mle" which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator using the name: result.

Examples

    # Get the current context/connection.
    con <- td_get_context()$connection
    
    # Load example data
    loadExampleData("ntree_example", "employee_table", "emp_table_by_dept")

    # Create object(s) of class "tbl_teradata".
    employee_table <- tbl(con, "employee_table")
    emp_table_by_dept <- tbl(con, "emp_table_by_dept")

    # Example 1 - This example finds the employees who report to employee
    # 100 (either directly or indirectly) by traversing the tree
    # of employees from employee 100 downward.
    td_ntree_out1 <- td_ntree_mle(data=employee_table,
                                  root.node = 'mgr_id is NULL',
                                  node.id='emp_id',
                                  parent.id='mgr_id',
                                  starts.with='emp_id=100',
                                  mode='down',
                                  output='end',
                                  result='PATH(emp_name) AS path'
                                 )

    # Example 2 - This example finds the reporting structure by department.
    td_ntree_out2 <- td_ntree_mle(data=emp_table_by_dept,
                                  data.partition.column='department',
                                  root.node = "mgr_id = 'none'",
                                  node.id='id',
                                  parent.id='mgr_id',
                                  starts.with='root',
                                  mode='down',
                                  output='all',
                                  result='PATH(name) AS path, PATH(id) as path2'
                                 )