Deleting Duplicate Output Table Rows - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantage™

Duplicate output table rows appear because each pair of items appears in two rows—one row has item1 in col1_item1 and item2 in col1_item2, and the other row has item2 in col1_item1 and item1 in col1_item2. To delete duplicate output table rows, use this code (where output_table is the output table name):

DROP TABLE copy;

CREATE MULTISET TABLE copy AS (
  SELECT *, ROW_NUMBER() OVER(ORDER BY col1_item1, col1_item2) rn
  FROM output_table
) WITH DATA;

DROP TABLE DuplicatesRemoved;

CREATE MULTISET TABLE DuplicatesRemoved AS (
  SELECT * FROM copy
) WITH DATA;

DELETE FROM DuplicatesRemoved WHERE rn IN (
  SELECT a.rn FROM DuplicatesRemoved a
  JOIN copy b
  ON a.col1_item1=b.col1_item2 AND a.col1_item2=b.col1_item1 AND a.rn < b.rn
);

DROP TABLE copy;
CFilter DuplicatesRemoved Table Schema
Column Data Type Description
col1_item1 VARCHAR Name of item1.
col1_item2 VARCHAR Name of item2.
rn INTEGER Row number in output_table when ordered by col1_item1, col1_item2.