IdentityMatch Example - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

The objective of the example is to find the credit scores of job applicants by matching the information in two tables.

The example looks for exact matches (NominalMatchColumns) to the email address and approximate matches (FuzzyMatchColumns) for lastname, firstname, zip code, city, and department, with different match metrics and match weights.

Input

  • a: applicant_reference, which has hypothetical information from job applicants
  • b: applicant_external, an external source table, which has missing and incomplete information, but includes credit scores
a: applicant_reference
id firstname lastname email city zipcode department gender
1 John Dewey john.dewey@corp-mark.com Sugar Land 77459 Marketing Male
2 Sarah Anders sarah.anders@corp-sales.com Pearland 77584 Sales Female
3 Elizabeth Hall elizabeth.hall@corp-eng.com Galveston 77550 Engineering Female
4 James Nickson james.nick@corp-it.com Pasadena 77501 IT Male
5 Kim Lee kim.lee@corp-sys.com Clear Lake City 77058 Systems Female
6 Jessica Right jessica.right@corp-mark.com Sugar Land 77459 Marketing Female
b: applicant_external
id firstname lastname email city zipcode department creditscore
1 John Dewey john.dewey@corp-mark.com Sugar Land 7774 market 700
2   Hall   Galveston 77550 eng 790
3 Sarah Anders sarah.anders@corp-sales.com pear 77584 sales 650
4 Jessica right   Sugar Land 77459 Marketing 690
5 James Nickson   Pasadena 7750 IT 620
6 Kim       77058 system 570

SQL Call

SELECT * FROM IdentityMatch (
  ON applicant_reference AS a PARTITION BY ANY
  ON applicant_external AS b DIMENSION
  USING
  IDColumn ('a.id: b.id')
  NominalMatchColumns ('a.email: b.email')
  FuzzyMatchColumns ('a.lastname: b.lastname, JARO-WINKLER, 3',
                     'a.firstname: b.firstname, JARO-WINKLER, 2',
                     'a.zipcode: b.zipcode, JD, 2',
                     'a.city: b.city, LD, 2',
                     'a.department: b.department, COSINE, 1')
  Accumulate ('a.firstname','a.lastname' ,'b.lastname', 'a.email',
              'b.email','a.zipcode', 'b.zipcode', 'a.department',
              'b.department','b.creditscore')
  ThresholdScore (0.5)
) AS dt ORDER BY "a.id", score DESC;

Output

The output table has the matching information from both input tables and the matching score. If multiple row entries exist for the same id, the higher scoring entry gives the best match. For example, the first row (with score 1) is a better match than the second (with score 0.6036). The creditscore column contains applicant credit scores.

a.id a.firstname a.lastname a.email a.zipcode a.department b.id b.lastname b.email b.zipcode b.department b.creditscore score
1 John Dewey john.dewey@corp-mark.com 77459 Marketing 1 Dewey john.dewey@corp-mark.com 7774 market 700 1.0000
1 John Dewey john.dewey@corp-mark.com 77459 Marketing 4 right   77459 Marketing 690 0.6036
2 Sarah Anders sarah.anders@corp-sales.com 77584 Sales 3 Anders sarah.anders@corp-sales.com 77584 sales 650 1.0000
3 Elizabeth Hall elizabeth.hall@corp-eng.com 77550 Engineering 2 Hall   77550 eng 790 0.7000
4 James Nickson james.nick@corp-it.com 77501 IT 5 Nickson   7750 IT 620 0.8000
6 Jessica Right jessica.right@corp-mark.com 77459 Marketing 4 right   77459 Marketing 690 1.0000