The TF_IDF function can do either of the following:
- Take any document set and output the inverse document frequency (IDF) and term frequency- inverse document frequency (TF-IDF) scores for each term.
- Use the output of a previous run of the TF_IDF function on a training document set to predict TF_IDF scores of an input (test) document set.
You can use the TF-IDF scores as input for many document clustering and
classification algorithms, including:
- Cosine-similarity
- Latent Dirichlet allocation
- K-means clustering
- K-nearest neighbors
You can use the TF-IDF scores derived from a training document set to generate a model in a classification function (for example, SparseSVMTrainer) and then use the resulting TF-IDF scores in a classification prediction function (for example, SparseSVMPredictor).