Deleting Duplicate Output Table Rows

Deleting Duplicate Output Table Rows - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product

Aster Analytics

Release Number

6.21

Published

November 2016

Language

English (United States)

Last Update

2018-04-14

dita:mapPath

kiu1466024880662.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1021

lifecycle

Product Category

Software

Duplicate output table rows appear because each pair of items appears in two rows—one row has item1 in col1_item1 and item2 in col1_item2, and the other row has item2 in col1_item1 and item1 in col1_item2. To delete duplicate output table rows, use this code (where output_table is the output table name):

DROP TABLE IF EXISTS copy;

CREATE TABLE copy DISTRIBUTE BY HASH(col1_item1)
  AS SELECT *, ROW_NUMBER() OVER(ORDER BY col1_item1, col1_item2) rn
    FROM  output_table;

DROP TABLE IF EXISTS DuplicatesRemoved;

CREATE TABLE DuplicatesRemoved AS SELECT * FROM copy;

DELETE FROM DuplicatesRemoved WHERE rn IN (
  SELECT a.rn FROM DuplicatesRemoved a
  JOIN copy b
  ON a.col1_item1=b.col1_item2 AND a.col1_item2=b.col1_item1 AND a.rn < b.rn);

DROP TABLE IF EXISTS copy;

CFilter DuplicatesRemoved Table Schema
Column Name	Data Type	Description
col1_item1	VARCHAR	Name of item1.
col1_item2	VARCHAR	Name of item2.
rn	INTEGER	Row number in output_table when ordered by col1_item1, col1_item2.