Deleting Duplicate Edges Table Rows - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.10
1.1
Published
October 2019
Language
English (United States)
Last Update
2019-12-31
dita:mapPath
ima1540829771750.ditamap
dita:ditavalPath
jsj1481748799576.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantage™

The Edges table of an undirected graph can have duplicate rows, because each edge between vertices A and B is represented by two rows—one row has A in the source column and B in the target column, and the other row has B in the source column and A in the target column. Teradata recommends deleting duplicate rows from the Edges table, using this code (where edges_table is the Edges table name):

DROP TABLE copy;

CREATE MULTISET TABLE copy AS (
  SELECT *, ROW_NUMBER() OVER(ORDER BY source, target) rn
  FROM edges_table
) WITH DATA;

DROP TABLE DuplicatesRemoved;

CREATE MULTISET TABLE DuplicatesRemoved AS (
  SELECT * FROM copy
) WITH DATA;

DELETE FROM DuplicatesRemoved WHERE rn IN (
  SELECT a.rn FROM DuplicatesRemoved a
  JOIN Copy b
  ON a.source=b.target AND a.target=b.source AND a.rn < b.rn
);

DROP TABLE Copy;
AllPairsShortestPath DuplicatesRemoved Table Schema
Column Data Type Description
source VARCHAR Source key.
target VARCHAR Target key.
rn INTEGER Row number in edges_table.