Description
The Modularity (td_modularity_mle
) function uses a clustering
algorithm to detect communities in networks (graphs). The function needs
no prior knowledge or estimation of starting cluster centers and assumes
no particular data distribution of the input data set.
Usage
td_modularity_mle (
vertices.data = NULL,
edges.data = NULL,
sources.data = NULL,
target.key = NULL,
edge.weight = NULL,
community.association = NULL,
resolution = 1,
seed = 1,
accumulate = NULL,
vertices.data.sequence.column = NULL,
edges.data.sequence.column = NULL,
sources.data.sequence.column = NULL,
vertices.data.partition.column = NULL,
edges.data.partition.column = NULL,
sources.data.partition.column = NULL
)
Arguments
vertices.data |
Required Argument.
Specifies the vertex tbl_teradata where each row represents a vertex of the graph.
|
vertices.data.partition.column |
Required Argument.
Specifies Partition By columns for vertices.data.
Values to this argument can be provided as a vector, if multiple
columns are used for partition.
Types: character OR vector of Strings (character)
|
edges.data |
Required Argument.
Specifies an edge tbl_teradata where each row represents an edge of the graph.
|
edges.data.partition.column |
Required Argument.
Specifies Partition by columns for edges.data.
Values to this argument can be provided as a vector, if multiple
columns are used for partition.
Types: character OR vector of Strings (character)
|
sources.data |
Optional Argument.
Specifies source vertices. This is a legacy tbl_teradata, which formerly
required for directed graph. Function ignores this tbl_teradata and treats
all graphs as undirected.
|
sources.data.partition.column |
Optional Argument. Required when 'sources.data' argument is specified.
Specifies Partition By columns for sources.data.
Values to this argument can be provided as a vector, if multiple
columns are used for partition.
Types: character OR vector of Strings (character)
|
target.key |
Required Argument.
Specifies the key of the target vertex of an edge. The key consists
of the names of one or more edges tbl_teradata columns.
Types: character OR vector of Strings (character)
|
edge.weight |
Optional Argument.
Specifies the name of the edges tbl_teradata column that contains
edge weights. The weights are positive values. By default, the weight
of each edge is 1 (that is, the graph is unweighted). This argument
determines how the function treats duplicate edges (that is, edges
with the same source and destination, which might have different
weights). For a weighted graph, the function treats duplicate edges
as a single edge whose weight is the sum of the weights of the
duplicate edges. For an unweighted graph, the function uses only one
of the duplicate edges.
Types: character
|
community.association |
Optional Argument.
Specifies the name of the column that represents the community
association of the vertices. Use this argument if you already know
some vertex communities.
Types: character
|
resolution |
Optional Argument.
Specifies hierarchical-level information for the communities. The
default resolution is 1.0. If you specify a list of resolution
values, the function incrementally finds the communities for each
value and for the default value. Each resolution must be a distinct
numeric value in the range [0.0, 1000000.0]. The value 0.0 puts each
node in its own community of size 1. You can specify a maximum of 500
resolution values. To get the modularity of more than 500 resolution
points, call the function multiple times, specifying different values
in each call.
Default Value: 1
Types: numeric OR vector of numerics
|
seed |
Optional Argument.
Specifies the seed to use to create a random number during modularity
computation. The seed must be a positive BIGINT value. The function
multiplies seed by the hash code of vertex_key to generate a unique
seed for each vertex. The default seed is 1. The seed significantly
impacts community formation (and modularity score), because the
function uses seed for these purposes:
1. To break ties between different vertices during community formation.
2. To determine how deeply to analyze the graph.
Deeper analysis of the graph can improve community formation,
but can also increase execution time.
Default Value: 1
Types: numeric
|
accumulate |
Optional Argument.
Specifies the names of the vertices columns to copy to the community
vertex tbl_teradata. By default, the function copies the vertex_key columns
to the output vertex tbl_teradata for each vertex, changing the
column names to id, id_1, id_2, and so on.
Types: character OR vector of Strings (character)
|
vertices.data.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "vertices.data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: character OR vector of Strings (character)
|
edges.data.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "edges.data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: character OR vector of Strings (character)
|
sources.data.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "sources.data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: character OR vector of Strings (character)
|
Value
Function returns an object of class "td_modularity_mle" which is a named
list containing Teradata tbl objects.
Named list members can be referenced directly with the "$" operator
using following names:
community.edge.data
-
output
Examples
# Get the current context/connection
con <- td_get_context()$connection
# Load example data.
# The examples use a graph in which nodes represent persons who are geographically distributed
# across the United States and are connected on an online social network, where they follow each other.
# The directed edges start at the follower and end at the leader.
loadExampleData("modularity_example", "friends", "followers_leaders")
# Create remote tibble objects.
friends <- tbl(con, "friends")
followers_leaders <- tbl(con, "followers_leaders")
# Example 1 - Unweighted Edges.
# Followers follow leaders with equal intensity (all edges have default weight 1).
td_modularity_out1 <- td_modularity_mle(vertices.data = friends,
vertices.data.partition.column = c("friends_name"),
edges.data = followers_leaders,
edges.data.partition.column = c("follower"),
target.key = c("leader"),
community.association = "group_id",
accumulate = c("friends_name","location")
)
# Example 2 - Weighted Edges and Community Edge Table.
# Followers follow leaders with different intensity.
td_modularity_out2 <- td_modularity_mle(vertices.data = friends,
vertices.data.partition.column = c("friends_name"),
edges.data = followers_leaders,
edges.data.partition.column = c("follower"),
target.key = c("leader"),
edge.weight = "intensity",
community.association = "group_id",
accumulate = c("friends_name","location")
)