Detecting UNIQUE and PRIMARY KEY violations in the data to be loaded is not always straightforward. In many cases the source is not a database you can easily run a query on to detect non-unique keys. These are some techniques you can use to detect these conditions in your source data:
- Build a version of the target table in the target database without a UNIQUE or PRIMARY KEY constraint, load the data, then run a “detect duplicates” query to find the problematic rows/keys. In some cases only loading a sample of the data is sufficient to provide enough clues to find and fix the problem in the source data.
- An alternative step (using an ETL tool) would be to use this “keyless” version of the target table as a staging/temp table, which would load then check for issues like duplicate keys and dump them to a second error table. If no issues are found, then transfer the data to the final destination table.
- If the source is a database, then run the “detect duplicates” query there.