Teradata Package for Python Function Reference | 17.10 - Modularity - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

teradataml.analytics.mle.Modularity = class Modularity(builtins.object)

Methods defined here:

__init__(self, vertices_data=None, edges_data=None, sources_data=None, target_key=None, edge_weight=None, community_association=None, resolution=1.0, seed=1, accumulate=None, vertices_data_sequence_column=None, edges_data_sequence_column=None, sources_data_sequence_column=None, vertices_data_partition_column=None, edges_data_partition_column=None, sources_data_partition_column=None): DESCRIPTION: The Modularity function uses a clustering algorithm to detect communities in networks (graphs). The function needs no prior knowledge or estimation of starting cluster centers and assumes no particular data distribution of the input data set. PARAMETERS: vertices_data: Required Argument. Specifies vertex teradataml DataFrame where each row represents a vertex of the graph. vertices_data_partition_column: Required Argument. Specifies Partition By columns for vertices_data. Values to this argument can be provided as a list, if multiple columns are used for partition. Types: str OR list of Strings (str) edges_data: Required Argument. Specifies edge teradataml DataFrame where each row represents an edge of the graph. edges_data_partition_column: Required Argument. Specifies Partition By columns for edges_data. Values to this argument can be provided as a list, if multiple columns are used for partition. Types: str OR list of Strings (str) sources_data: Optional Argument. Specifies source vertices teradataml DataFrame, required for directed graph. Function ignores this teradataml DataFrame and treats all graphs as undirected. sources_data_partition_column: Optional Argument. Required when 'sources_data' argument is specified. Specifies Partition By columns for sources_data. Values to this argument can be provided as a list, if multiple columns are used for partition. Types: str OR list of Strings (str) target_key: Required Argument. Specifies the key of the target vertex of an edge. The key consists of the names of one or more edges teradataml DataFrame columns. Types: str OR list of Strings (str) edge_weight: Optional Argument. Specifies the name of the edges teradataml DataFrame column that contains edge weights. The weights are positive values. By default, the weight of each edge is 1 (that is, the graph is unweighted). This argument determines how the function treats duplicate edges (that is, edges with the same source and destination, which might have different weights). For a weighted graph, the function treats duplicate edges as a single edge whose weight is the sum of the weights of the duplicate edges. For an unweighted graph, the function uses only one of the duplicate edges. Types: str community_association: Optional Argument. Specifies the name of the column that represents the community association of the vertices. Use this argument if you already know some vertex communities. Types: str resolution: Optional Argument. Specifies hierarchical-level information for the communities. If you specify a list of resolution values, the function incrementally finds the communities for each value and for the default value. Each resolution must be a distinct float value in the range [0.0, 1000000.0]. The value 0.0 puts each node in its own community of size 1. You can specify a maximum of 500 resolution values. To get the modularity of more than 500 resolution points, call the function multiple times, specifying different values in each call. Default Value: 1.0 Types: float OR list of floats seed: Optional Argument. Specifies the seed to use to create a random number during modularity computation. The seed must be a positive BIGINT value. The function multiplies seed by the hash code of vertex_key to generate a unique seed for each vertex. The seed significantly impacts community formation (and modularity score), because the function uses seed for these purposes: • To break ties between different vertices during community formation. • To determine how deeply to analyze the graph. Deeper analysis of the graph can improve community formation, but can also increase execution time. Default Value: 1 Types: int accumulate: Optional Argument. Specifies the names of the vertices columns to copy to the community vertex teradataml DataFrame. By default, the function copies the vertex_key columns to the output vertex teradataml DataFrame for each vertex, changing the column names to id, id_1, id_2, and so on. Types: str OR list of Strings (str) vertices_data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "vertices_data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) edges_data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "edges_data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) sources_data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "sources_data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) RETURNS: Instance of Modularity. Output teradataml DataFrames can be accessed using attribute references, such as ModularityObj.<attribute_name>. Output teradataml DataFrame attribute names are: 1. community_edge_data 2. output RAISES: TeradataMlException EXAMPLES: # Load example data. # The examples use a graph in which nodes represent persons who are geographically distributed # across the United States and are connected on an online social network, where they follow each other. # The directed edges start at the follower and end at the leader. load_example_data("modularity", ["friends", "followers_leaders"]) # Create teradataml DataFrame objects. friends = DataFrame.from_table("friends") followers_leaders = DataFrame.from_table("followers_leaders") # Example 1 - Unweighted Edges. # Followers follow leaders with equal intensity (all edges have default weight 1). Modularity_out1 = Modularity(vertices_data = friends, vertices_data_partition_column = ["friends_name"], edges_data = followers_leaders, edges_data_partition_column = ["follower"], target_key = ["leader"], community_association = "group_id", accumulate = ["friends_name","location"] ) # Print the results print(Modularity_out1) # Example 2 - Weighted Edges and Community Edge Table. # Followers follow leaders with different intensity. Modularity_out2 = Modularity(vertices_data = friends, vertices_data_partition_column = ["friends_name"], edges_data = followers_leaders, edges_data_partition_column = ["follower"], target_key = ["leader"], edge_weight = "intensity", community_association = "group_id", accumulate = ["friends_name","location"] ) # Print the results print(Modularity_out2)

__repr__(self): Returns the string representation for a Modularity class instance.

get_build_time(self): Function to return the build time of the algorithm in seconds. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_prediction_type(self): Function to return the Prediction type of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_target_column(self): Function to return the Target Column of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

show_query(self): Function to return the underlying SQL query. When model object is created using retrieve_model(), then None is returned.