1.1 - 8.10 - RandomWalkSample Background - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Teradata Vantage
Release Number
October 2019
Content Type
Programming Reference
Publication ID
English (United States)

The goal of graph sampling is to identify a subgraph that preserves graph properties as well as possible. If the subgraph is a good representation of the graph, substituting the subgraph for the graph in ML Engine functions significantly decreases runtime without significantly decreasing accuracy.

Random walk sampling is a graph-sampling technique that randomly selects a starting vertex and then either explores a neighboring vertex or returns (flies back) to the starting vertex. If the sampling process reaches a sink vertex (an isolated component or a loop), it randomly selects another vertex and continues until it reaches the desired sample size (the desired number of vertices).

The resulting subset graph includes the edges between sampled vertices and their nearest neighbors (edges that exist in the original graph), even if the sampling process did not explore those edges. Including those edges makes the subset graph more representative of the original graph.

For more information about sampling from large graphs, see: http://cs.stanford.edu/people/jure/pubs/sampling-kdd06.pdf.