TF_IDF Arguments - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product

Aster Analytics

Release Number

7.00.02

Published

September 2017

Language

English (United States)

Last Update

2018-04-17

dita:mapPath

uce1497542673292.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1022

lifecycle

Product Category

Software

Formula

[Optional] Specifies the formula for calculating the term frequency (tf) of term t in document d:

'normal' (normalized frequency, default)
tf(t,d) = f ((t,d) / sum {w,w ∈d}

This value is rf divided by the number of terms in the document.
'bool' (Boolean frequency)
tf((t,d) = 1 if t occurs in d; otherwise, tf((t,d) = 0.
'log' (logarithmically-scaled frequency)
tf((t,d) = log(f((t,d)+1)

where f((t,d) is the number of times t occurs in d (that is, the raw frequency, rf).
'augment' (augmented frequency, which prevents bias towards longer documents)
tf((t,d) = 0.5 +

(0.5 × f ((t,d) / max {f(w,d) : w ∈d })

This value is rf divided by the maximum raw frequency of any term in the document.

When using the output of a previous run of the TF_IDF function on a training document set to predict TF_IDF scores on an input document set, use the same Formula value for the input document set that you used for the training document set.