FellegiSunter Arguments - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢
ComparisonFields
Specify the columns of input_table to use in the field-pair similarity in the training process. If the value in the column is less than threshold_value, the field pair does not agree; otherwise, the field pair agrees.
Default behavior: The threshold_value of each field is 1.
TagColumn
[Optional] If you specify this argument, the function uses supervised learning; if you omit it, the function uses unsupervised learning.

This argument specifies the name of the column that indicates whether two objects match. The column must contain only the values 'M' (matched) and 'U' (unmatched).

InitialM
[Optional] For unsupervised learning, this argument specifies the initial value of m, which is the probability that a field agrees, given that the object-pair belongs to the same object. For supervised learning, the function ignores this argument.
Default: 0.9
InitialU
[Optional] For unsupervised learning, this argument specifies the initial value of u, which is the probability that a field agrees, given that the object-pair belongs to a different object. For supervised learning, the function ignores this argument.
Default: 0.1
InitialP
[Optional]
For unsupervised learning, this argument specifies the initial value of p, which is the percentage of all possible object-pairs that contain the same object. For supervised learning, the function ignores this argument.
Default: 0.1
MaxIteration
[Optional] For unsupervised learning, this argument specifies the maximum number of iterations. For supervised learning, the function ignores this argument.
Default: 100
Eta
[Optional] For unsupervised learning, this argument specifies the tolerance of the termination criterion. At the end of each iteration, the function computes the difference between the current value of p and the value of p at the end of the previous iteration. If the difference is less than eta_value, the function terminates.
Default: 1*10-5
Lambda
[Optional]
Specify the Type I (false negative) error, which occurs if an unmatched comparison is erroneously linked.
Default: 0.9
Mu
[Optional] Specify the Type II (false positive) error, which occurs if a matched comparison is erroneously not linked.
Default: 0.9
Lambda and Mu determine the values of the model properties lower_bound and upper_bound. For details, see: Fellegi, Ivan; Sunter, Alan (December 1969). "A Theory for Record Linkage" Journal of the American Statistical Association 64