TF_IDF version 2.1, TF version 1.1
SELECT * FROM TF_IDF ( ON TF ( ON { table | view | (query) } PARTITION BY docid [ Formula ({ 'normal' | 'bool' | 'log' | 'augment' }) ] ) AS tf PARTITION BY term [ ON (SELECT COUNT (DISTINCT docid) FROM doccount_table) AS doccount DIMENSION ] [ ON (SELECT term, COUNT (DISTINCT docid) FROM docperterm_table GROUP BY term) AS docperterm PARTITION BY term ] [ ON (SELECT DISTINCT (term) AS term, idf FROM tf_idf_output_table ) AS idf PARTITION BY term ] );
Recommended for large document sets:
SELECT * FROM TF_IDF ( ON TF ( ON input_table PARTITION BY docid [ Formula ({ 'normal' | 'bool' | 'log' | 'augment' }) ] ) AS tf PARTITION BY term ON (SELECT COUNT (DISTINCT docid) FROM doccount_table ) AS doccount DIMENSION ON (SELECT term, COUNT (DISTINCT docid) FROM docperterm_table GROUP BY term) AS docperterm PARTITION BY term ) ORDER BY docid;
Acceptable for small document sets:
SELECT * FROM TF_IDF ( ON TF ( ON input_table PARTITION BY docid ) AS tf PARTITION BY term ON (SELECT COUNT (DISTINCT docid) FROM input_table ) AS doccount DIMENSION ) ORDER BY docid;