7.00.02 - TF_IDF Arguments - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Aster Analytics
Release Number
Release Date
September 2017
Content Type
Programming Reference
User Guide
Publication ID
English (United States)
[Optional] Specifies the formula for calculating the term frequency (tf) of term t in document d:
  • 'normal' (normalized frequency, default)

    tf(t,d) = f ((t,d) / sum {w,wd}

    This value is rf divided by the number of terms in the document.

  • 'bool' (Boolean frequency)

    tf((t,d) = 1 if t occurs in d; otherwise, tf((t,d) = 0.

  • 'log' (logarithmically-scaled frequency)

    tf((t,d) = log(f((t,d)+1)

    where f((t,d) is the number of times t occurs in d (that is, the raw frequency, rf).

  • 'augment' (augmented frequency, which prevents bias towards longer documents)

    tf((t,d) = 0.5 +

    (0.5 × f ((t,d) / max {f(w,d) : wd })

    This value is rf divided by the maximum raw frequency of any term in the document.

When using the output of a previous run of the TF_IDF function on a training document set to predict TF_IDF scores on an input document set, use the same Formula value for the input document set that you used for the training document set.