Teradata R Package Function Reference - RandomWalkSample - Teradata R Package - Look here for syntax, methods and examples for the functions included in the Teradata R Package.

Teradata® R Package Function Reference

Product
Teradata R Package
Release Number
16.20
Published
February 2020
Language
English (United States)
Last Update
2020-02-28
dita:id
B700-4007
lifecycle
previous
Product Category
Teradata Vantage

Description

Random walk sample (td_random_walk_sample_mle) is a graph-sampling technique to identify a subgraph that preserves graph properties as well as possible.

Usage

  td_random_walk_sample_mle (
     vertices.data = NULL,
     edges.data = NULL,
     target.key = NULL,
     sample.rate = 0.15,
     flyback.rate = 0.15,
     seed = 1000,
     accumulate = NULL,
     vertices.data.sequence.column = NULL,
     edges.data.sequence.column = NULL,
     vertices.data.partition.column = NULL,
     edges.data.partition.column = NULL
  )

Arguments

vertices.data

Required Argument.
Specifies the tbl_teradata containing the vertex table.

vertices.data.partition.column

Required Argument.
Partition By Columns for 'vertices.data'.
Values to this argument can be provided as vector, if multiple columns are used for partition.

edges.data

Required Argument.
Specifies the tbl_teradata containing the edge table.

edges.data.partition.column

Required Argument.
Partition By Columns for 'edges.data'.
Values to this argument can be provided as vector, if multiple columns are used for partition.

target.key

Required Argument.
Specifies the names of the columns in the edges tbl_teradata that identify the target vertex of an edge.

sample.rate

Optional Argument.
Specifies the sampling rate. This value must be in the range (0, 1.0).
Default Value: 0.15

flyback.rate

Optional Argument.
Specifies the chance, when visiting a vertex, of flying back to the starting vertex. This value must be in the range (0, 1.0).
Default Value: 0.15

seed

Optional Argument.
The seed used to generate a series of random numbers for 'sample.rate', 'flyback.rate', and any random number used internally. Specifying this value guarantees that the function result is repeatable on the same cluster. The default value is 1000.

accumulate

Optional Argument.
Specifies the names of columns in the input vertex tbl_teradata ('vertices.data') to copy to the output vertex tbl_teradata.

vertices.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "vertices.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.

edges.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "edges.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.

Value

Function returns an object of class "td_random_walk_sample_mle" which is a named list containing Teradata tbl objects.
Named list members can be referenced directly with the "$" operator using following names:

  1. output.vertex.table

  2. output.edge.table

  3. output

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("randomwalksample_example", "citvertices_2", "citedges_2")
    
    # Create remote tibble objects.
    citvertices_2 <- tbl(con, "citvertices_2")
    citedges_2 <- tbl(con, "citedges_2")
    
    # Example 1 - This function takes an input graph (which is typically large) and outputs 
    # a sample graph that preserves graph properties as well as possible.
    td_random_walk_sample_out <- td_random_walk_sample_mle(vertices.data = citvertices_2,
                                                       vertices.data.partition.column = c("id"),
                                                       edges.data = citedges_2,
                                                       edges.data.partition.column = c("from_id"),
                                                       target.key = c("to_id"),
                                                       sample.rate = 0.15,
                                                       flyback.rate = 0.15,
                                                       seed = 1000
                                                       )