Teradata R Package Function Reference | 17.00 - 17.00 - Modularity - Teradata R Package

Teradata® R Package Function Reference

prodname
Teradata R Package
vrm_release
17.00
created_date
September 2020
category
Programming Reference
featnum
B700-4007-090K

Description

The Modularity function uses a clustering algorithm to detect communities in networks (graphs). The function needs no prior knowledge or estimation of starting cluster centers and assumes no particular data distribution of the input data set.

Usage

  td_modularity_mle (
      vertices.data = NULL,
      edges.data = NULL,
      sources.data = NULL,
      target.key = NULL,
      edge.weight = NULL,
      community.association = NULL,
      resolution = 1,
      seed = 1,
      accumulate = NULL,
      vertices.data.sequence.column = NULL,
      edges.data.sequence.column = NULL,
      sources.data.sequence.column = NULL,
      vertices.data.partition.column = NULL,
      edges.data.partition.column = NULL,
      sources.data.partition.column = NULL
  )

Arguments

vertices.data

Required Argument.
Specifies the vertex tbl_teradata where each row represents a vertex of the graph.

vertices.data.partition.column

Required Argument.
Specifies Partition By columns for "vertices.data".
Values to this argument can be provided as a vector, if multiple columns are used for partition.
Types: character OR vector of Strings (character)

edges.data

Required Argument.
Specifies the edge tbl_teradata where each row represents an edge of the graph.

edges.data.partition.column

Required Argument.
Specifies Partition By columns for "edges.data".
Values to this argument can be provided as a vector, if multiple columns are used for partition.
Types: character OR vector of Strings (character)

sources.data

Optional Argument.
Specifies the input tbl_teradata containing the source vertices. This is a legacy tbl_teradata, formerly required for directed graph. Function ignores this tbl_teradata and treats all graphs as undirected.

sources.data.partition.column

Optional Argument. Required when "sources.data" argument is specified.
Specifies Partition By columns for "sources.data".
Values to this argument can be provided as a vector, if multiple columns are used for partition.
Types: character OR vector of Strings (character)

target.key

Required Argument.
Specifies the key of the target vertex of an edge. The key consists of the names of one or more columns in "edges.data".
Types: character OR vector of Strings (character)

edge.weight

Optional Argument.
Specifies the name of the columns in "edges.data" that contains edge weights. The weights are positive values. By default, the weight of each edge is 1 (that is, the graph is unweighted). This argument determines how the function treats duplicate edges (that is, edges with the same source and destination, which might have different weights). For a weighted graph, the function treats duplicate edges as a single edge whose weight is the sum of the weights of the duplicate edges. For an unweighted graph, the function uses only one of the duplicate edges.
Types: character

community.association

Optional Argument.
Specifies the name of the column that represents the community association of the vertices. Use this argument if you already know some vertex communities.
Types: character

resolution

Optional Argument.
Specifies hierarchical-level information for the communities. The default resolution is 1.0. If you specify a list of resolution values, the function incrementally finds the communities for each value and for the default value. Each resolution must be a distinct numeric value in the range [0.0, 1000000.0]. The value 0.0 puts each node in its own community of size 1. You can specify a maximum of 500 resolution values. To get the modularity of more than 500 resolution points, call the function multiple times, specifying different values in each call.
Default Value: 1
Types: numeric OR vector of numerics

seed

Optional Argument.
Specifies the seed to use to create a random number during modularity computation. The seed must be a positive BIGINT value. The function multiplies seed by the hash code of "vertices.data.partition.column" to generate a unique seed for each vertex. The seed significantly impacts community formation (and modularity score), because the function uses seed for these purposes:

  1. To break ties between different vertices during community formation.

  2. To determine how deeply to analyze the graph.

Deeper analysis of the graph can improve community formation, but can also increase execution time.
Default Value: 1
Types: numeric

accumulate

Optional Argument.
Specifies the names of the columns in "vertices.data" to copy to the output tbl_teradata "output". By default, the function copies the "vertices.data.partition.column" columns to the output tbl_teradata "output" for each vertex, changing the column names to id, id_1, id_2, and so on.
Types: character OR vector of Strings (character)

vertices.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "vertices.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

edges.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "edges.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

sources.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "sources.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

Value

Function returns an object of class "td_modularity_mle" which is a named list containing objects of class "tbl_teradata".
Named list members can be referenced directly with the "$" operator using the following names:

  1. community.edge.data

  2. output

Examples

    # Get the current context/connection.
    con <- td_get_context()$connection
    
    # Load example data.
    # The examples use a graph in which nodes represent persons who are geographically distributed
    # across the United States and are connected on an online social network, where they follow
    # each other.
    # The directed edges start at the follower and end at the leader.
    
    loadExampleData("modularity_example", "friends", "followers_leaders")
    
    # Create object(s) of class "tbl_teradata".
    friends <- tbl(con, "friends")
    followers_leaders <- tbl(con, "followers_leaders")
    
    # Example 1 - Unweighted Edges.
    # Followers follow leaders with equal intensity (all edges have default weight 1).
    td_modularity_mle_out1 <- td_modularity_mle(vertices.data = friends,
                                                vertices.data.partition.column = c("friends_name"),
                                                edges.data = followers_leaders,
                                                edges.data.partition.column = c("follower"),
                                                target.key = c("leader"),
                                                accumulate = c("friends_name","location"),
                                                community.association = "group_id"
                                                )
    
    # Example 2 - Weighted Edges and Community Edge Table.
    # Followers follow leaders with different intensity.
    td_modularity_mle_out2 <- td_modularity_mle(vertices.data = friends,
                                                vertices.data.partition.column = c("friends_name"),
                                                edges.data = followers_leaders,
                                                edges.data.partition.column = c("follower"),
                                                target.key = c("leader"),
                                                accumulate = c("friends_name","location"),
                                                edge.weight = "intensity",
                                                community.association = "group_id"
                                                )