Description
The NTree (td_ntree_mle
) function is a hierarchical analysis SQL
function that can build and traverse tree structures on all worker
machines. The function reads the data only once from the disk and
creates the trees in memory.
Usage
td_ntree_mle (
data = NULL,
data.partition.column = "1",
data.order.column = NULL,
root.node = NULL,
node.id = NULL,
parent.id = NULL,
allow.cycles = FALSE,
starts.with = NULL,
mode = NULL,
output = NULL,
max.distance = 5,
logging = FALSE,
result = NULL,
data.sequence.column = NULL
)
Arguments
data |
Required Argument.
Specifies the input tbl_teradata that contains the input table.
|
data.partition.column |
Optional Argument.
Partition By columns for 'data'.
Values to this argument can be provided as vector, if multiple
columns are used for partition.
Default Value: 1
Types: character OR vector of Strings (character)
|
data.order.column |
Optional Argument.
Order By columns for 'data'.
Values to this argument can be provided as vector, if multiple
columns are used for ordering.
Types: character OR vector of Strings (character)
|
root.node |
Required Argument.
Specifies the logical SQL expression that defines the root nodes of
the trees (for example, parent.id IS NULL).
|
node.id |
Required Argument.
Specifies the SQL expression whose value uniquely identifies a node
in the input tbl_teradata (for example, order_id).
Note: A node can appear multiple times in the data set, with
different parents.
|
parent.id |
Required Argument.
Specifies the SQL expression whose value identifies the parent node.
|
allow.cycles |
Optional Argument.
Specifies whether trees can contain cycles. If not, a cycle in the
data set causes the function to throw an exception.
Default Value: FALSE
|
starts.with |
Required Argument.
Specifies the node from which to start tree traversal.
Permitted Values: ROOT, LEAF, or a SQL expression that identifies a node.
|
mode |
Required Argument.
Specifies the direction of tree traversal from the start node, either up
to the root node or down to the leaf nodes.
Permitted Values: UP, DOWN
|
output |
Required Argument.
Specifies when to output a tuple either at every node along the traversal
path ("all") or only at the end of the traversal path ("end").
Default Value: END
Permitted Values: END, ALL
|
max.distance |
Optional Argument.
Specifies the maximum tree depth.
Default Value: 5
|
logging |
Optional Argument.
Specifies whether the function prints log messages.
Default Value: FALSE
|
result |
Required Argument.
Specifies aggregate operations to perform during tree traversal. The
function reports the result of each aggregate operation in the output
table. The syntax of aggregate is:
operation (expression) [ ALIAS alias ], where
operation is either PATH, SUM, LEVEL, MAX, MIN, IS_CYCLE, AVG, or
PROPAGATE.
expression is a SQL expression. If operation is LEVEL or IS_CYCLE,
then expression must be *.
alias is the name of the output tbl_teradata column that contains the
result of the operation.
The default value is the string "operation(expression)" without the quotation
marks. For example, PATH(node_name).
Note: The function ignores alias if it is the same as an input
tbl_teradata column name.
For the path from the "starts.with" node to the last traversed node,
the operations do the following:
PATH: Outputs the value of expression for each node, separating
values with "->".
SUM: Computes the value of expression for each node and outputs the
sum of these values.
LEVEL: Outputs the number of hops.
MAX: Computes the value of expression for each node and outputs the
highest of these values.
MIN: Computes the value of expression for each node and outputs the
lowest of these values.
IS_CYCLE: Outputs the cycle (if any).
AVG: Computes the value of expression for each node and outputs the
average of these values.
PROPAGATE: Evaluates expression with the value of the 'starts.with'
node and propagates the result to every node.
|
data.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
|
Value
Function returns an object of class "td_ntree_mle" which is a named list
containing Teradata tbl object.
Named list member can be referenced directly with the "$" operator
using name: result.
Examples
# Get the current context/connection
con <- td_get_context()$connection
# Load example data
loadExampleData("ntree_example", "employee_table", "emp_table_by_dept")
# Create remote tibble objects
employee_table <- tbl(con, "employee_table")
emp_table_by_dept <- tbl(con, "emp_table_by_dept")
# Example 1 - This example finds the employees who report to employee
# 100 (either directly or indirectly) by traversing the tree
# of employees from employee 100 downward.
td_ntree_out1 <- td_ntree_mle(data=employee_table,
root.node = 'mgr_id is NULL',
node.id='emp_id',
parent.id='mgr_id',
starts.with='emp_id=100',
mode='down',
output='end',
result='PATH(emp_name) AS path'
)
# Example 2 - This example finds the reporting structure by department.
td_ntree_out2 <- td_ntree_mle(data=emp_table_by_dept,
data.partition.column='department',
root.node = "mgr_id = 'none'",
node.id='id',
parent.id='mgr_id',
starts.with='root',
mode='down',
output='all',
result='PATH(name) AS path, PATH(id) as path2'
)