7.00.02 - Background - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Release Date
September 2017
Content Type
Programming Reference
User Guide
Publication ID
B700-1022-700K
Language
English (United States)

TF-IDF stands for "term frequency-inverse document frequency," a technique for evaluating the importance of a specific term in a specific document in a document set. Term frequency (tf) is the number of times that the term appears in the document and inverse document frequency (idf) is the number of times that the term appears in the document set. The TF-IDF score for a term is tf *idf. A term with a high TF-IDF score is especially relevant to the specific document.

The TF_IDF function represents each document as an N-dimensional vector, where N is the number of terms in the document set (therefore, the document vector is usually very sparse). Each entry in the document vector is the TF-IDF score of a term.