- Formula
- [Optional] Specifies the formula for calculating the term frequency (tf) of term t in document d:
-
'normal' (normalized frequency, default)
tf(t,d) = f ((t,d) / sum {w,w ∈d}
This value is rf divided by the number of terms in the document.
-
'bool' (Boolean frequency)
tf((t,d) = 1 if t occurs in d; otherwise, tf((t,d) = 0.
-
'log' (logarithmically-scaled frequency)
tf((t,d) = log(f((t,d)+1)
where f((t,d) is the number of times t occurs in d (that is, the raw frequency, rf).
-
'augment' (augmented frequency, which prevents bias towards longer documents)
tf((t,d) = 0.5 +
(0.5 × f ((t,d) / max {f(w,d) : w ∈d })
This value is rf divided by the maximum raw frequency of any term in the document.
When using the output of a previous run of the TF_IDF function on a training document set to predict TF_IDF scores on an input document set, use the same Formula value for the input document set that you used for the training document set. -
'normal' (normalized frequency, default)