Teradata R Package Function Reference - Modularity - Teradata R Package - Look here for syntax, methods and examples for the functions included in the Teradata R Package.

Teradata® R Package Function Reference

Product
Teradata R Package
Release Number
16.20
Published
February 2020
Language
English (United States)
Last Update
2020-02-28
dita:id
B700-4007
lifecycle
previous
Product Category
Teradata Vantage

Description

The Modularity (td_modularity_mle) function uses a clustering algorithm to detect communities in networks (graphs). The function needs no prior knowledge or estimation of starting cluster centers and assumes no particular data distribution of the input data set.

Usage

  td_modularity_mle (
      vertices.data = NULL,
      edges.data = NULL,
      sources.data = NULL,
      target.key = NULL,
      edge.weight = NULL,
      community.association = NULL,
      resolution = 1,
      seed = 1,
      accumulate = NULL,
      vertices.data.sequence.column = NULL,
      edges.data.sequence.column = NULL,
      sources.data.sequence.column = NULL,
      vertices.data.partition.column = NULL,
      edges.data.partition.column = NULL,
      sources.data.partition.column = NULL
  )

Arguments

vertices.data

Required Argument.
Specifies the vertex tbl_teradata where each row represents a vertex of the graph.

vertices.data.partition.column

Required Argument.
Specifies Partition By columns for vertices.data.
Values to this argument can be provided as a vector, if multiple columns are used for partition.
Types: character OR vector of Strings (character)

edges.data

Required Argument.
Specifies an edge tbl_teradata where each row represents an edge of the graph.

edges.data.partition.column

Required Argument.
Specifies Partition by columns for edges.data.
Values to this argument can be provided as a vector, if multiple columns are used for partition.
Types: character OR vector of Strings (character)

sources.data

Optional Argument.
Specifies source vertices. This is a legacy tbl_teradata, which formerly required for directed graph. Function ignores this tbl_teradata and treats all graphs as undirected.

sources.data.partition.column

Optional Argument. Required when 'sources.data' argument is specified.
Specifies Partition By columns for sources.data.
Values to this argument can be provided as a vector, if multiple columns are used for partition.
Types: character OR vector of Strings (character)

target.key

Required Argument.
Specifies the key of the target vertex of an edge. The key consists of the names of one or more edges tbl_teradata columns.
Types: character OR vector of Strings (character)

edge.weight

Optional Argument.
Specifies the name of the edges tbl_teradata column that contains edge weights. The weights are positive values. By default, the weight of each edge is 1 (that is, the graph is unweighted). This argument determines how the function treats duplicate edges (that is, edges with the same source and destination, which might have different weights). For a weighted graph, the function treats duplicate edges as a single edge whose weight is the sum of the weights of the duplicate edges. For an unweighted graph, the function uses only one of the duplicate edges.
Types: character

community.association

Optional Argument.
Specifies the name of the column that represents the community association of the vertices. Use this argument if you already know some vertex communities.
Types: character

resolution

Optional Argument.
Specifies hierarchical-level information for the communities. The default resolution is 1.0. If you specify a list of resolution values, the function incrementally finds the communities for each value and for the default value. Each resolution must be a distinct numeric value in the range [0.0, 1000000.0]. The value 0.0 puts each node in its own community of size 1. You can specify a maximum of 500 resolution values. To get the modularity of more than 500 resolution points, call the function multiple times, specifying different values in each call.
Default Value: 1
Types: numeric OR vector of numerics

seed

Optional Argument.
Specifies the seed to use to create a random number during modularity computation. The seed must be a positive BIGINT value. The function multiplies seed by the hash code of vertex_key to generate a unique seed for each vertex. The default seed is 1. The seed significantly impacts community formation (and modularity score), because the function uses seed for these purposes:
1. To break ties between different vertices during community formation.
2. To determine how deeply to analyze the graph.
Deeper analysis of the graph can improve community formation, but can also increase execution time.
Default Value: 1
Types: numeric

accumulate

Optional Argument.
Specifies the names of the vertices columns to copy to the community vertex tbl_teradata. By default, the function copies the vertex_key columns to the output vertex tbl_teradata for each vertex, changing the column names to id, id_1, id_2, and so on.
Types: character OR vector of Strings (character)

vertices.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "vertices.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

edges.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "edges.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

sources.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "sources.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

Value

Function returns an object of class "td_modularity_mle" which is a named list containing Teradata tbl objects. Named list members can be referenced directly with the "$" operator using following names:

  1. community.edge.data

  2. output

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    # The examples use a graph in which nodes represent persons who are geographically distributed 
    # across the United States and are connected on an online social network, where they follow each other.
    # The directed edges start at the follower and end at the leader.
    
    loadExampleData("modularity_example", "friends", "followers_leaders")
    
    # Create remote tibble objects.
    friends <- tbl(con, "friends")
    followers_leaders <- tbl(con, "followers_leaders")
    
    # Example 1 - Unweighted Edges. 
    # Followers follow leaders with equal intensity (all edges have default weight 1).
    td_modularity_out1 <- td_modularity_mle(vertices.data = friends,
                                       vertices.data.partition.column = c("friends_name"),
                                       edges.data = followers_leaders,
                                       edges.data.partition.column = c("follower"),
                                       target.key = c("leader"),
                                       community.association = "group_id",
                                       accumulate = c("friends_name","location")
                                       )
    
    # Example 2 - Weighted Edges and Community Edge Table.
    # Followers follow leaders with different intensity.
    td_modularity_out2 <- td_modularity_mle(vertices.data = friends,
                                       vertices.data.partition.column = c("friends_name"),
                                       edges.data = followers_leaders,
                                       edges.data.partition.column = c("follower"),
                                       target.key = c("leader"),
                                       edge.weight = "intensity",
                                       community.association = "group_id",
                                       accumulate = c("friends_name","location")
                                       )