The edges table of an undirected graph can have duplicate rows, because each edge between vertices A and B is represented by two rows—one row has A in the source column and B in the target column, and the other row has B in the source column and A in the target column. Teradata recommends deleting duplicate rows from the edges table, using this code (where edges_table is the edges table name):
DROP TABLE IF EXISTS copy; CREATE TABLE copy DISTRIBUTE BY HASH(source) AS SELECT *, ROW_NUMBER() OVER(ORDER BY source, target) rn FROM edges_table; DROP TABLE IF EXISTS DuplicatesRemoved; CREATE TABLE DuplicatesRemoved AS * SELECT FROM copy; DELETE FROM DuplicatesRemoved WHERE rn IN ( SELECT a.rn FROM DuplicatesRemoved a JOIN Copy b ON a.source=b.target AND a.target=b.source AND a.rn < b.rn); DROP TABLE IF EXISTS Copy;
Column Name | Data Type | Description |
---|---|---|
source | VARCHAR | Source key. |
target | VARCHAR | Target key. |
rn | INTEGER | Row number in edges_table. |