TFIDF Input - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.10
1.1
Published
October 2019
Language
English (United States)
Last Update
2019-12-31
dita:mapPath
ima1540829771750.ditamap
dita:ditavalPath
jsj1481748799576.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

The TFIDF function always requires as input the output of the TF function. Whether the other TFIDF input tables are required or optional depend on your reason for running the function.

Table Description
TF TF function input; document set.
DocCount Required if running function to output IDF and TF-IDF values for each term in document set.
DocPerTerm Optional if running function to output IDF and TF-IDF values for each term in document set.

If you omit this table, the function creates it by processing the entire document set, which can require a large amount of memory. If there is not enough memory to process the entire document set, the DocPerTerm table is required.

IDF Required if running function to predict TF-IDF scores.

This table is the output of an earlier call to TFIDF, using the training document set as input to the TF function, the DocCount table, and optionally, the DocPerTerm table.

TF Schema

Column Data Type Description
docid Any Document identifier.
term VARCHAR Term.
count INTEGER Number of times that term appears in document.

TF Output and TFIDF Input Table Schema

Column Data Type Description
docid Any Document identifier.
term VARCHAR Term.
tf DOUBLE PRECISION Term frequency.
count INTEGER Number of times that term appears in document.

DocCount Schema

Column Data Type Description
count BIGINT Number of documents in document set.

DocPerTerm Schema

Column Data Type Description
term VARCHAR Term.
count BIGINT Number of documents that contain term.