Background - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software

The goal of graph sampling is to identify a subgraph that preserves graph properties as well as possible. If the subgraph is a good representation of the graph, substituting the subgraph for the graph in analytic functions significantly decreases runtime without significantly decreasing accuracy.

Random walk sampling is a graph-sampling technique that randomly selects a starting vertex and then either explores a neighboring vertex or returns ("flies back") to the starting vertex. If the sampling process reaches a sink vertex (an isolated component or a loop), it randomly selects another vertex and continues until it reaches the desired sample size (the desired number of vertices).

The resulting subset graph includes the edges between sampled vertices and their nearest neighbors (edges that exist in the original graph), even if the sampling process did not explore those edges. Including those edges makes the subset graph more representative of the original graph.

For more information about sampling from large graphs, see: http://cs.stanford.edu/people/jure/pubs/sampling-kdd06.pdf