TD_TFIDF Function | TFIDF | Teradata Vantage - TD_TFIDF - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-04-06
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantageā„¢

Term Frequency-Inverse Document Frequency (TF-IDF) is a technique for evaluating the importance of a specific term in a specific document in a document set. Term frequency (tf) is the number of times that the term appears in the document and inverse document frequency (idf) is the number of times that the term appears in the document set. The TF-IDF score for a term is tf * idf. A term with a high TF-IDF score is relevant to the specific document.

You can use the TF-IDF scores as input for documents clustering and classification algorithms, including:

  • Cosine-similarity
  • Latent Dirichlet allocation
  • K-means clustering
  • K-nearest neighbors

TD_TFIDF function represents each document as an N-dimensional vector, where N is the number of terms in the document set (therefore, the document vector is sparse). Each entry in the document vector is the TF-IDF score of a term.