TF_IDF - Aster Analytics

Teradata Aster® Analytics Foundation User Guide

Aster Analytics
September 2017
Programming Reference
User Guide
English (United States)
The TF_IDF function can do either of the following:

  • Take any document set and output the inverse document frequency (IDF) and term frequency- inverse document frequency (TF-IDF) scores for each term.
  • Use the output of a previous run of the TF_IDF function on a training document set to predict TF_IDF scores of an input (test) document set.
You can use the TF-IDF scores as input for many document clustering and classification algorithms, including:
  • Cosine-similarity
  • Latent Dirichlet allocation
  • K-means clustering
  • K-nearest neighbors

You can use the TF-IDF scores derived from a training document set to generate a model in a classification function (for example, SparseSVMTrainer) and then use the resulting TF-IDF scores in a classification prediction function (for example, SparseSVMPredictor).