doc_id_column |
BYTEINT, SMALLINT, INTEGER, BIGINT |
Document identifier of document d. |
token_column |
CHAR, VARCHAR |
Term t. |
TD_TF |
FLOAT |
Term frequency of term t in document d, calculated as specified by TFNormalization formula. |
TD_IDF |
FLOAT |
Inverse document frequency of term t in document corpus D, calculated as specified by IDFNormalization formula. |
TD_TF_IDF |
FLOAT |
TFIDF score of term t in document d in corpus D, calculated as specified by the Regularization formula. |
accumulate_column |
Any |
Accumulate column copied from input to output. |