7.00.02 - Background - Aster Analytics

Teradata AsterĀ® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Published
September 2017
Language
English (United States)
Last Update
2018-04-17
dita:mapPath
uce1497542673292.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
zuk1466006200888

TF-IDF stands for "term frequency-inverse document frequency," a technique for evaluating the importance of a specific term in a specific document in a document set. Term frequency (tf) is the number of times that the term appears in the document and inverse document frequency (idf) is the number of times that the term appears in the document set. The TF-IDF score for a term is tf *idf. A term with a high TF-IDF score is especially relevant to the specific document.

The TF_IDF function represents each document as an N-dimensional vector, where N is the number of terms in the document set (therefore, the document vector is usually very sparse). Each entry in the document vector is the TF-IDF score of a term.