- Formula
- [Optional] Specify the formula for calculating the term frequency (tf) of term t in document d:
Option Description 'normal' (Default) Normalized frequency: tf(t,d) = f ((t,d) / sum {w,w ∈d}
This value is rf divided by number of terms in document.
'bool' Boolean frequency: tf((t,d) = 1 if t occurs in d; otherwise, tf((t,d) = 0.
'log' Logarithmically-scaled frequency: tf((t,d) = log(f((t,d)+1)
where f((t,d) is the number of times t occurs in d (that is, raw frequency, rf).
'augment' Augmented frequency, which prevents bias towards longer documents: tf((t,d) = 0.5 + (0.5 × f ((t,d) / max {f(w,d) : w ∈d })
This value is rf divided by maximum raw frequency of any term in document.
When using the output of a previous run of the TFIDF function on a training document set to predict TFIDF scores on an input document set, use the same Formula value for the input document set that you used for the training document set.