Teradata Package for Python Function Reference - Closeness - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

teradataml.analytics.mle.Closeness = class Closeness(builtins.object)

teradataml.analytics.mle.Closeness(vertices_data=None, edges_data=None, target_key=None, sources_data=None, targets_data=None, directed=True, edge_weight=None, max_distance=10, group_size=None, sample_rate=1.0, seed=None, accumulate=None, vertices_data_sequence_column=None, edges_data_sequence_column=None, sources_data_sequence_column=None, targets_data_sequence_column=None, vertices_data_partition_column=None, edges_data_partition_column=None, sources_data_partition_column=None, targets_data_partition_column=None, vertices_data_order_column=None, edges_data_order_column=None, sources_data_order_column=None, targets_data_order_column=None)

Methods defined here:

__init__(self, vertices_data=None, edges_data=None, target_key=None, sources_data=None, targets_data=None, directed=True, edge_weight=None, max_distance=10, group_size=None, sample_rate=1.0, seed=None, accumulate=None, vertices_data_sequence_column=None, edges_data_sequence_column=None, sources_data_sequence_column=None, targets_data_sequence_column=None, vertices_data_partition_column=None, edges_data_partition_column=None, sources_data_partition_column=None, targets_data_partition_column=None, vertices_data_order_column=None, edges_data_order_column=None, sources_data_order_column=None, targets_data_order_column=None): DESCRIPTION: The Closeness function returns closeness and k-degree scores for each specified source vertex in a graph. The closeness scores are the inverse of the sum, the inverse of the average, and the sum of inverses for the shortest distances to all reachable target vertices (excluding the source vertex itself). The graph can be directed or undirected, weighted or unweighted. PARAMETERS: vertices_data: Required Argument. Specifies the teradataml DataFrame where each row represents a vertex of the graph. vertices_data_partition_column: Required Argument. Specifies Partition By columns for vertices_data. Values to this argument can be provided as a list, if multiple columns are used for partition. Types: str OR list of Strings (str) vertices_data_order_column: Optional Argument. Specifies Order By columns for vertices_data. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) edges_data: Required Argument. Specifies the teradataml DataFrame where each row represents an edge of the graph. edges_data_partition_column: Required Argument. Specifies Partition By columns for edges_data. Values to this argument can be provided as a list, if multiple columns are used for partition. Types: str OR list of Strings (str) edges_data_order_column: Optional Argument. Specifies Order By columns for edges_data. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) target_key: Required Argument. Specifies the target key (the names of the edges_data teradataml DataFrame columns that identify the target vertex). If you specify targets_data, then the function uses only the vertices in targets_data as targets (which must be a subset of those that this argument specifies). Types: str OR list of Strings (str) sources_data: Required for directed graph, optional for undirected graph. Specifies the teradataml DataFrame which contains the vertices to use as sources. sources_data_partition_column: Required Argument when sources_data is used. Specifies Partition By columns for sources_data. Values to this argument can be provided as a list, if multiple columns are used for partition. Types: str OR list of Strings (str) sources_data_order_column: Optional Argument. Specifies Order By columns for sources_data. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) targets_data: Required for directed graph, optional for undirected graph. Specifies the teradataml DataFrame which contains the vertices to use as targets. targets_data_partition_column: Required Argument when targets_data is used. Specifies Partition By columns for targets_data. Values to this argument can be provided as a list, if multiple columns are used for partition. Types: str OR list of Strings (str) targets_data_order_column: Optional Argument. Specifies Order By columns for targets_data. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) directed: Optional Argument. Specifies whether the graph is directed. Default Value: True Types: bool edge_weight: Optional Argument. Specifies the name of the edges_data teradataml DataFrame column that contains edge weights. The weights are positive values. By default, the weight of each edge is 1 (that is, the graph is unweighted). Types: str max_distance: Optional Argument. Specifies the maximum distance between the source and target vertices. A negative max_distance specifies an infinite distance. If vertices are separated by more than max_distance, the function does not output them. Default Value: 10 Types: int group_size: Optional Argument. Specifies the number of source vertices that execute a single-node shortest path (SNSP) algorithm in parallel. If group_size exceeds the number of source vertices in each partition, s, then s is the group size. By default, the function calculates the optimal group size based on various cluster and query characteristics. Running a group of vertices on each vWorker, in parallel, uses less memory than running all vertices on each vWorker. Types: int sample_rate: Optional Argument. Specifies the sample rate (the percentage of source vertices to sample), a numeric value in the range (0, 1]. Default Value: 1.0 Types: float seed: Optional Argument. Specifies the random seed, used for deterministic results. Types: int accumulate: Optional Argument. Specifies the names of the vertices_data teradataml DataFrame columns to copy to the output teradataml DataFrame. Types: str OR list of Strings (str) vertices_data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "vertices_data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) edges_data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "edges_data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) sources_data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "sources_data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) targets_data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "targets_data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) RETURNS: Instance of Closeness. Output teradataml DataFrames can be accessed using attribute references, such as ClosenessObj.<attribute_name>. Output teradataml DataFrame attribute name is: result RAISES: TeradataMlException EXAMPLES: # Load the data to run the example. load_example_data("Closeness", ["callers", "calls"]) # Create teradataml DataFrame object. callers = DataFrame.from_table("callers") calls = DataFrame.from_table("calls") sources = DataFrame.from_query("select * from callers where callerid <= 3") target = DataFrame.from_query("select * from callers where callerid >3") # Example 1 - Running Closeness function for unweighted and unbounded. closeness_out1 = Closeness(vertices_data=callers, vertices_data_partition_column='callerid', edges_data=calls, edges_data_partition_column='callerfrom', sources_data=sources, sources_data_partition_column='callerid', targets_data=target, targets_data_partition_column='callerid', target_key='callerto', accumulate=['callerid', 'callername'], max_distance=-1, edges_data_sequence_column='callerfrom', vertices_data_sequence_column='callerid' ) # Print the output DataFrames. print(closeness_out1.result) # Example 2 - Running Closeness function for weighted, bounded graph and with max_distance # argument taking 12. closeness_out2 = Closeness(vertices_data=callers, vertices_data_partition_column='callerid', edges_data=calls, edges_data_partition_column='callerfrom', sources_data=sources, sources_data_partition_column='callerid', targets_data=target, targets_data_partition_column='callerid', target_key='callerto', edge_weight='calls', accumulate=['callerid', 'callername'], max_distance=12, edges_data_sequence_column='callerfrom', vertices_data_sequence_column='callerid' ) # Print the output DataFrames. print(closeness_out2.result) # Example 3 - Running Closeness function for weighted, bounded graph and with max_distance # argument taking 8. closeness_out3 = Closeness(vertices_data=callers, vertices_data_partition_column='callerid', edges_data=calls, edges_data_partition_column='callerfrom', sources_data=sources, sources_data_partition_column='callerid', targets_data=target, targets_data_partition_column='callerid', target_key='callerto', edge_weight='calls', accumulate=['callerid', 'callername'], max_distance=8, edges_data_sequence_column='callerfrom', vertices_data_sequence_column='callerid' ) # Print the output DataFrames. print(closeness_out3.result) # Example 4 - Running Closeness function for unweighted and unbounded graph # without sources_data and target_data. closeness_out4 = Closeness(vertices_data=callers, vertices_data_partition_column='callerid', edges_data=calls, edges_data_partition_column='callerfrom', target_key='callerto', accumulate=['callerid', 'callername'], max_distance=-1, edges_data_sequence_column='callerfrom', vertices_data_sequence_column='callerid' ) # Print the output DataFrames. print(closeness_out4.result)

__repr__(self): Returns the string representation for a Closeness class instance.

get_build_time(self): Function to return the build time of the algorithm in seconds. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_prediction_type(self): Function to return the Prediction type of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_target_column(self): Function to return the Target Column of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

show_query(self): Function to return the underlying SQL query. When model object is created using retrieve_model(), then None is returned.