The TFIDF function always requires as input the output of the TF function. Whether the other TFIDF input tables are required or optional depend on your reason for running the function.
Table | Description |
---|---|
TF | TF function input; document set. |
DocCount | Required if running function to output IDF and TF-IDF values for each term in document set. |
DocPerTerm | Optional if running function to output IDF and TF-IDF values for each term in document set. If you omit this table, the function creates it by processing the entire document set, which can require a large amount of memory. If there is not enough memory to process the entire document set, the DocPerTerm table is required. |
IDF | Required if running function to predict TF-IDF scores. This table is the output of an earlier call to TFIDF, using the training document set as input to the TF function, the DocCount table, and optionally, the DocPerTerm table. |
TF Schema
Column | Data Type | Description |
---|---|---|
docid | Any | Document identifier. |
term | VARCHAR | Term. |
count | INTEGER | Number of times that term appears in document. |
TF Output and TFIDF Input Table Schema
Column | Data Type | Description |
---|---|---|
docid | Any | Document identifier. |
term | VARCHAR | Term. |
tf | DOUBLE PRECISION | Term frequency. |
count | INTEGER | Number of times that term appears in document. |
DocCount Schema
Column | Data Type | Description |
---|---|---|
count | BIGINT | Number of documents in document set. |
DocPerTerm Schema
Column | Data Type | Description |
---|---|---|
term | VARCHAR | Term. |
count | BIGINT | Number of documents that contain term. |