TF_IDF - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product

Aster Analytics

Release Number

7.00.02

Published

September 2017

Language

English (United States)

Last Update

2018-04-17

dita:mapPath

uce1497542673292.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1022

lifecycle

Product Category

Software

The TF_IDF function can do either of the following:

Take any document set and output the inverse document frequency (IDF) and term frequency- inverse document frequency (TF-IDF) scores for each term.
Use the output of a previous run of the TF_IDF function on a training document set to predict TF_IDF scores of an input (test) document set.

You can use the TF-IDF scores as input for many document clustering and classification algorithms, including:

Cosine-similarity
Latent Dirichlet allocation
K-means clustering
K-nearest neighbors

You can use the TF-IDF scores derived from a training document set to generate a model in a classification function (for example, SparseSVMTrainer) and then use the resulting TF-IDF scores in a classification prediction function (for example, SparseSVMPredictor).