Closeness Background - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

8.00

1.0

Published

May 2019

Language

English (United States)

Last Update

2019-11-22

dita:mapPath

blj1506016597986.ditamap

dita:ditavalPath

blj1506016597986.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

Closeness and k-degree scores are fundamental distance-based centrality metrics used in network structure analysis. Both measure the time needed to spread information from a source vertex to a set of target vertices.

The closeness score is classically defined for each vertex v as either the inverse sum or the inverse average of the shortest distances from v to all other reachable vertices u. The classical definition does not apply to disconnected graphs; alternative definitions of closeness have been proposed for them.

The Closeness function applies the classical definition of closeness to connected graphs and an alternative definition to disconnected graphs. The alternative definition that the function uses adds 0 to the sum for each unreachable target vertex, which is consistent with the classic definition, because the inverse distance is effectively 0 for a disconnected graph.

The k-degree score is defined for vertex v as the number of vertices whose distance from v is less than or equal to k.

The Closeness function uses a hybrid distributed all pairs shortest path (APSP) algorithm to calculate the shortest distances from each specified source vertex to each specified target vertex and then aggregates these shortest distances into closeness and k-degree scores for each source vertex. By restricting the number of parallel single node shortest path (SNSP) executions to groups of P vertices, the APSP algorithm enables a trade-off between time and memory usage. The APSP algorithm completes when N/P of these groups have completed, where N is the number of vertices in the graph. (For more information, see AllPairsShortestPath.)