RandomWalkSample Background - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

The goal of graph sampling is to identify a subgraph that preserves graph properties as well as possible. If the subgraph is a good representation of the graph, substituting the subgraph for the graph in ML Engine functions significantly decreases runtime without significantly decreasing accuracy.

Random walk sampling is a graph-sampling technique that randomly selects a starting vertex and then either explores a neighboring vertex or returns (flies back) to the starting vertex. If the sampling process reaches a sink vertex (an isolated component or a loop), it randomly selects another vertex and continues until it reaches the desired sample size (the desired number of vertices).

The resulting subset graph includes the edges between sampled vertices and their nearest neighbors (edges that exist in the original graph), even if the sampling process did not explore those edges. Including those edges makes the subset graph more representative of the original graph.

For more information about sampling from large graphs, see: http://cs.stanford.edu/people/jure/pubs/sampling-kdd06.pdf