Closeness Background - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

Closeness and k-degree scores are fundamental distance-based centrality metrics used in network structure analysis. Both measure the time needed to spread information from a source vertex to a set of target vertices.

The closeness score is classically defined for each vertex v as either the inverse sum or the inverse average of the shortest distances from v to all other reachable vertices u. The classical definition does not apply to disconnected graphs; alternative definitions of closeness have been proposed for them.

The Closeness function applies the classical definition of closeness to connected graphs and an alternative definition to disconnected graphs. The alternative definition that the function uses adds 0 to the sum for each unreachable target vertex, which is consistent with the classic definition, because the inverse distance is effectively 0 for a disconnected graph.

The k-degree score is defined for vertex v as the number of vertices whose distance from v is less than or equal to k.

The Closeness function uses a hybrid distributed all pairs shortest path (APSP) algorithm to calculate the shortest distances from each specified source vertex to each specified target vertex and then aggregates these shortest distances into closeness and k-degree scores for each source vertex. By restricting the number of parallel single node shortest path (SNSP) executions to groups of P vertices, the APSP algorithm enables a trade-off between time and memory usage. The APSP algorithm completes when N/P of these groups have completed, where N is the number of vertices in the graph. (For more information, see AllPairsShortestPath.)