TFIDF Function Syntax | Teradata Vantage - TFIDF Syntax - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
9.02
9.01
2.0
1.3
Published
February 2022
Language
English (United States)
Last Update
2022-02-10
dita:mapPath
rnn1580259159235.ditamap
dita:ditavalPath
ybt1582220416951.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

TFIDF version 2.3, TF version 1.2

SELECT * FROM TFIDF (
  ON TF (
    ON { table | view | (query) } PARTITION BY docid     
    [ USING Formula ({ 'normal' | 'bool' | 'log' | 'augment' }) ]
  ) AS TF PARTITION BY term 
  [ ON (SELECT COUNT (DISTINCT docid) FROM doccount_table) AS DocCount DIMENSION ]
  [ ON (SELECT term, COUNT (DISTINCT docid) FROM docperterm_table GROUP BY term)
      AS DocPerTerm PARTITION BY term
  ]
  [ ON (SELECT DISTINCT (term) AS term, idf FROM tf_idf_output_table)
      AS IDF PARTITION BY term
  ]
) AS alias;

Large Document Sets

For large documents sets, the DocPerTerm table is required.

For training, this is the syntax for large document sets:

SELECT * FROM TFIDF (
  ON TF (
    ON { table | view | (query) } PARTITION BY docid      
    [ USING Formula ({ 'normal' | 'bool' | 'log' | 'augment' }) ]
  ) AS TF PARTITION BY term 
  ON (SELECT COUNT (DISTINCT docid) FROM doccount_table) AS DocCount DIMENSION
  ON (SELECT term, COUNT (DISTINCT docid) FROM docperterm_table GROUP BY term)
    AS DocPerTerm PARTITION BY term 
) AS alias ORDER BY docid;

For prediction, this is the syntax for large document sets:

SELECT * FROM TFIDF (
  ON TF (
    ON { table | view | (query) } PARTITION BY docid      
    [ USING Formula ({ 'normal' | 'bool' | 'log' | 'augment' }) ]
  ) AS TF PARTITION BY term 
  [ ON (SELECT term, COUNT (DISTINCT docid) FROM docperterm_table GROUP BY term)
      AS DocPerTerm PARTITION BY term
  ]
  [ ON (SELECT DISTINCT (term) AS term, idf FROM tf_idf_output_table)
      AS IDF PARTITION BY term
  ]
) AS alias ORDER BY docid;

Small Document Sets

This syntax is acceptable for small document sets:

SELECT * FROM TFIDF (
  ON TF (
    ON { table | view | (query) } PARTITION BY docid 
  ) AS TF PARTITION BY term 
  ON (SELECT COUNT (DISTINCT docid) FROM input_table) AS DocCount DIMENSION
) AS alias ORDER BY docid;