IdentityMatch Example | Teradata Vantage - IdentityMatch Example

IdentityMatch Example | Teradata Vantage - IdentityMatch Example - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

9.02

9.01

2.0

1.3

Published

February 2022

Language

English (United States)

Last Update

2022-02-10

dita:mapPath

rnn1580259159235.ditamap

dita:ditavalPath

ybt1582220416951.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

The objective of the example is to find the credit scores of job applicants by matching the information in two tables.

The example looks for exact matches (NominalMatchColumns) to the email address and approximate matches (FuzzyMatchColumns) for lastname, firstname, zip code, city, and department, with different match metrics and match weights.

Input

SourceTable: applicant_reference, which has hypothetical information from job applicants
ReferenceTable: applicant_external, an external source table, which has missing and incomplete information, but includes credit scores

SourceTable: applicant_reference
id	firstname	lastname	email	city	zipcode	department	gender
1	John	Dewey	john.dewey@corp-mark.com	Sugar Land	77459	Marketing	Male
2	Sarah	Anders	sarah.anders@corp-sales.com	Pearland	77584	Sales	Female
3	Elizabeth	Hall	elizabeth.hall@corp-eng.com	Galveston	77550	Engineering	Female
4	James	Nickson	james.nick@corp-it.com	Pasadena	77501	IT	Male
5	Kim	Lee	kim.lee@corp-sys.com	Clear Lake City	77058	Systems	Female
6	Jessica	Right	jessica.right@corp-mark.com	Sugar Land	77459	Marketing	Female

ReferenceTable: applicant_external
id	firstname	lastname	email	city	zipcode	department	creditscore
1	John	Dewey	john.dewey@corp-mark.com	Sugar Land	7774	market	700
2		Hall		Galveston	77550	eng	790
3	Sarah	Anders	sarah.anders@corp-sales.com	pear	77584	sales	650
4	Jessica	right		Sugar Land	77459	Marketing	690
5	James	Nickson		Pasadena	7750	IT	620
6	Kim				77058	system	570

SQL Call

SELECT * FROM IdentityMatch (
  ON applicant_reference AS SourceTable PARTITION BY ANY
  ON applicant_external AS ReferenceTable DIMENSION
  USING
  IDColumn ('a.id: b.id')
  NominalMatchColumns ('a.email: b.email')
  FuzzyMatchColumns ('a.lastname: b.lastname, JARO-WINKLER, 3',
                     'a.firstname: b.firstname, JARO-WINKLER, 2',
                     'a.zipcode: b.zipcode, JD, 2',
                     'a.city: b.city, LD, 2',
                     'a.department: b.department, COSINE, 1')
  Accumulate ('a.firstname','a.lastname' ,'b.lastname', 'a.email',
              'b.email','a.zipcode', 'b.zipcode', 'a.department',
              'b.department','b.creditscore')
  ThresholdScore (0.5)
) AS dt;

Output

The output table has the matching information from both input tables and the matching score. If multiple row entries exist for the same id, the higher scoring entry gives the best match. For example, the first row (with score 1) is a better match than the second (with score 0.6036). The creditscore column contains applicant credit scores.

 a.id a.firstname a.lastname a.email                     a.zipcode a.department b.id b.lastname b.email                     b.zipcode b.department b.creditscore score  
 ---- ----------- ---------- --------------------------- --------- ------------ ---- ---------- --------------------------- --------- ------------ ------------- ------ 
    6 jessica     right      jessica.right@corp-mark.com     77459 marketing       4 right                                      77459 marketing            690.0      1
    4 james       nickson    james.nick@corp-it.com          77501 it              5 nickson                                     7750 it                   620.0    0.8
    3 elizabeth   hall       elizabeth.hall@corp-eng.com     77550 engineering     2 hall                                       77550 eng                  790.0    0.7
    1 john        dewey      john.dewey@corp-mark.com        77459 marketing       4 right                                      77459 marketing            690.0 0.6036
    1 john        dewey      john.dewey@corp-mark.com        77459 marketing       1 dewey      john.dewey@corp-mark.com         7774 market               700.0      1
    2 sarah       anders     sarah.anders@corp-sales.com     77584 sales           3 anders     sarah.anders@corp-sales.com     77584 sales                650.0      1

Download a zip file of all examples and a SQL script file that creates their input tables.