TFIDF version 2.3, TF version 1.2
SELECT * FROM TFIDF ( ON TF ( ON { table | view | (query) } PARTITION BY docid [ USING Formula ({ 'normal' | 'bool' | 'log' | 'augment' }) ] ) AS tf PARTITION BY term [ ON (SELECT COUNT (DISTINCT docid) FROM doccount_table) AS doccount DIMENSION ] [ ON (SELECT term, COUNT (DISTINCT docid) FROM docperterm_table GROUP BY term) AS docperterm PARTITION BY term ] [ ON (SELECT DISTINCT (term) AS term, idf FROM tf_idf_output_table) AS idf PARTITION BY term ] ) AS alias;
Large Document Sets
For large documents sets, the docperterm_table is required.
For training, this is the syntax for large document sets:
SELECT * FROM TFIDF ( ON TF ( ON { table | view | (query) } PARTITION BY docid [ USING Formula ({ 'normal' | 'bool' | 'log' | 'augment' }) ] ) AS tf PARTITION BY term ON (SELECT COUNT (DISTINCT docid) FROM doccount_table) AS doccount DIMENSION ON (SELECT term, COUNT (DISTINCT docid) FROM docperterm_table GROUP BY term) AS docperterm PARTITION BY term ) AS alias ORDER BY docid;
For prediction, this is the syntax for large document sets:
SELECT * FROM TFIDF ( ON TF ( ON { table | view | (query) } PARTITION BY docid [ USING Formula ({ 'normal' | 'bool' | 'log' | 'augment' }) ] ) AS tf PARTITION BY term [ ON (SELECT term, COUNT (DISTINCT docid) FROM docperterm_table GROUP BY term) AS docperterm PARTITION BY term ] [ ON (SELECT DISTINCT (term) AS term, idf FROM tf_idf_output_table) AS idf PARTITION BY term ] ) AS alias ORDER BY docid;
Small Document Sets
This syntax is acceptable for small document sets:
SELECT * FROM TFIDF ( ON TF ( ON { table | view | (query) } PARTITION BY docid ) AS tf PARTITION BY term ON (SELECT COUNT (DISTINCT docid) FROM input_table) AS doccount DIMENSION ) AS alias ORDER BY docid;