Optional Syntax Elements for TD_TFIDF - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-10-04
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantage™
TFNormalization
Specifies the normalization method for calculating the term frequency (TF).
Default: NORMAL
The argument must have one of the following values:
Values Description
BOOL Boolean frequency:

tf(t,d)= 1 if t occurs in d; otherwise tf(t,d)= 0

COUNT Raw frequency

tf(t,d)= f(t,d)

where f(t,d) is the number of times t occurs in d (that is, raw frequency, rf).

NORMAL Normalized frequency

tf(t,d)= f(t,d) / sum {w : w ∈ d}

This value is rf divided by the number of terms in the document.

LOG Logarithmically-scaled frequency:

tf(t,d)= 1 + log(f(t,d))

This value is the natural logarithm of rf.

AUGMENT Augmented frequency, which prevents bias towards longer documents:

tf(t,d)= 0.5 + (0.5 × f(t,d) / max {f(w,d) : w ∈ d})

This value is rf divided by the maximum raw frequency of any term in the document.

IDFNormalization
Specifies the normalization method for calculating the inverse document frequency (IDF).
Default: LOG
The argument must have one of the following values:
Values Description
UNARY idf(t,D)= 1

Used to disable IDF calculation.

LOG idf(t,D)= log(N/Nt)

where N is the total number of documents in the corpus, that is, N= |D|and Nt is the number of documents d where the term t appears, that is, Nt= |{d ∈ D : t ∈ d}|

LOGNORM idf(t,D)= 1 + log(N / Nt)
SMOOTH idf(t,D)= 1 + log((1 + N) / (1 + Nt))
Regularization
Specifies the regularization method for calculating the TF-IDF score.
Default: NONE
The argument must have one of the following values:
Values Description
L2 Euclidean regularization:

tfidf(t,d)= tf(t,d) * idf(t,D) / sqrt(sum {(tf(w,d) * idf(w,D))^2 : w ∈ d})

The product of tf and idf values for a term t in document d is divided by the square root of the sum of the squared products of tf and idf values for each term in the document.

L1 Manhattan regularization:

tfidf(t,d)= tf(t,d) * idf(t,D) / sum {|tf(w,d) * idf(w,D)| : w ∈ d}

The product of tf and idf values for a term t in document d is divided by the sum of the absolute products of tf and idf values for each term in the document.

NONE No regularization:

tfidf(t,d)= tf(t,d) * idf(t,D)

Accumulate
Specifies the input columns to copy to the output table.