| |
Methods defined here:
- __init__(self, vertices_data=None, edges_data=None, sources_data=None, target_key=None, edge_weight=None, community_association=None, resolution=1.0, seed=1, accumulate=None, vertices_data_sequence_column=None, edges_data_sequence_column=None, sources_data_sequence_column=None, vertices_data_partition_column=None, edges_data_partition_column=None, sources_data_partition_column=None)
- DESCRIPTION:
The Modularity function uses a clustering algorithm to detect
communities in networks (graphs). The function needs no prior knowledge or
estimation of starting cluster centers and assumes no particular data distribution of the
input data set.
PARAMETERS:
vertices_data:
Required Argument.
Specifies vertex teradataml DataFrame where each row represents a vertex of the graph.
vertices_data_partition_column:
Required Argument.
Specifies Partition By columns for vertices_data.
Values to this argument can be provided as a list, if multiple columns
are used for partition.
Types: str OR list of Strings (str)
edges_data:
Required Argument.
Specifies edge teradataml DataFrame where each row represents an edge of the graph.
edges_data_partition_column:
Required Argument.
Specifies Partition By columns for edges_data.
Values to this argument can be provided as a list, if multiple columns
are used for partition.
Types: str OR list of Strings (str)
sources_data:
Optional Argument.
Specifies source vertices teradataml DataFrame, required for directed graph.
Function ignores this teradataml DataFrame and treats all graphs as undirected.
sources_data_partition_column:
Optional Argument. Required when 'sources_data' argument is specified.
Specifies Partition By columns for sources_data.
Values to this argument can be provided as a list, if multiple columns
are used for partition.
Types: str OR list of Strings (str)
target_key:
Required Argument.
Specifies the key of the target vertex of an edge. The key consists
of the names of one or more edges teradataml DataFrame columns.
Types: str OR list of Strings (str)
edge_weight:
Optional Argument.
Specifies the name of the edges teradataml DataFrame column that
contains edge weights. The weights are positive values. By default,
the weight of each edge is 1 (that is, the graph is unweighted). This
argument determines how the function treats duplicate edges (that is,
edges with the same source and destination, which might have
different weights). For a weighted graph, the function treats
duplicate edges as a single edge whose weight is the sum of the
weights of the duplicate edges. For an unweighted graph, the function
uses only one of the duplicate edges.
Types: str
community_association:
Optional Argument.
Specifies the name of the column that represents the community
association of the vertices. Use this argument if you already know
some vertex communities.
Types: str
resolution:
Optional Argument.
Specifies hierarchical-level information for the communities. If you
specify a list of resolution values, the function incrementally finds
the communities for each value and for the default value. Each resolution
must be a distinct float value in the range [0.0, 1000000.0]. The value 0.0
puts each node in its own community of size 1. You can specify a maximum of 500
resolution values. To get the modularity of more than 500 resolution
points, call the function multiple times, specifying different values
in each call.
Default Value: 1.0
Types: float OR list of floats
seed:
Optional Argument.
Specifies the seed to use to create a random number during modularity
computation. The seed must be a positive BIGINT value. The function
multiplies seed by the hash code of vertex_key to generate a unique
seed for each vertex. The seed significantly impacts community formation
(and modularity score), because the function uses seed for these purposes:
• To break ties between different vertices during community formation.
• To determine how deeply to analyze the graph. Deeper analysis of the graph
can improve community formation, but can also increase execution time.
Default Value: 1
Types: int
accumulate:
Optional Argument.
Specifies the names of the vertices columns to copy to the community
vertex teradataml DataFrame. By default, the function copies the vertex_key
columns to the output vertex teradataml DataFrame for each vertex, changing
the column names to id, id_1, id_2, and so on.
Types: str OR list of Strings (str)
vertices_data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "vertices_data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
edges_data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "edges_data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
sources_data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "sources_data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
RETURNS:
Instance of Modularity.
Output teradataml DataFrames can be accessed using attribute
references, such as ModularityObj.<attribute_name>.
Output teradataml DataFrame attribute names are:
1. community_edge_data
2. output
RAISES:
TeradataMlException
EXAMPLES:
# Load example data.
# The examples use a graph in which nodes represent persons who are geographically distributed
# across the United States and are connected on an online social network, where they follow each other.
# The directed edges start at the follower and end at the leader.
load_example_data("modularity", ["friends", "followers_leaders"])
# Create teradataml DataFrame objects.
friends = DataFrame.from_table("friends")
followers_leaders = DataFrame.from_table("followers_leaders")
# Example 1 - Unweighted Edges.
# Followers follow leaders with equal intensity (all edges have default weight 1).
Modularity_out1 = Modularity(vertices_data = friends,
vertices_data_partition_column = ["friends_name"],
edges_data = followers_leaders,
edges_data_partition_column = ["follower"],
target_key = ["leader"],
community_association = "group_id",
accumulate = ["friends_name","location"]
)
# Print the results
print(Modularity_out1)
# Example 2 - Weighted Edges and Community Edge Table.
# Followers follow leaders with different intensity.
Modularity_out2 = Modularity(vertices_data = friends,
vertices_data_partition_column = ["friends_name"],
edges_data = followers_leaders,
edges_data_partition_column = ["follower"],
target_key = ["leader"],
edge_weight = "intensity",
community_association = "group_id",
accumulate = ["friends_name","location"]
)
# Print the results
print(Modularity_out2)
- __repr__(self)
- Returns the string representation for a Modularity class instance.
- get_build_time(self)
- Function to return the build time of the algorithm in seconds.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_prediction_type(self)
- Function to return the Prediction type of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_target_column(self)
- Function to return the Target Column of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- show_query(self)
- Function to return the underlying SQL query.
When model object is created using retrieve_model(), then None is returned.
|