1.1 - 8.10 - TFIDF Syntax Elements - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.1
8.10
Release Date
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)
Formula
[Optional] Specify the formula for calculating the term frequency (tf) of term t in document d:
Option Description
'normal' (Default) Normalized frequency:

tf(t,d) = f ((t,d) / sum {w,wd}

This value is rf divided by number of terms in document.

'bool' Boolean frequency:

tf((t,d) = 1 if t occurs in d; otherwise, tf((t,d) = 0.

'log' Logarithmically-scaled frequency:

tf((t,d) = log(f((t,d)+1)

where f((t,d) is the number of times t occurs in d (that is, raw frequency, rf).

'augment' Augmented frequency, which prevents bias towards longer documents:

tf((t,d) = 0.5 + (0.5 × f ((t,d) / max {f(w,d) : wd })

This value is rf divided by maximum raw frequency of any term in document.

When using the output of a previous run of the TFIDF function on a training document set to predict TFIDF scores on an input document set, use the same Formula value for the input document set that you used for the training document set.