7.00.02 - Input - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Release Date
September 2017
Content Type
Programming Reference
User Guide
Publication ID
B700-1022-700K
Language
English (United States)

The TF_IDF function always requires as input the output of the TF function. The input for the TF function is the document set. The other TF_IDF input tables depend on your reason for running the function:

  • If you are running TF_IDF to output the IDF and TF-IDF values for each term in the document set, then TF_IDF also requires the input table doccount and has optional input table docperterm.
  • If you are running the function to predict TF_IDF values, then TF_IDF also requires the input table idf. The table idf is the output of an earlier call to TF_IDF, using the training document set as input to the TF function, the doccount table, and optionally, the docperterm table.

If you omit the docperterm table, the function creates it by processing the entire document set, which can require a large amount of memory. If there is not enough memory to process the entire document set, then the docperterm table is required.

TF Input Table (Document Set) Schema
Column Name Data Type Description
docid Any Document identifier.
term VARCHAR Term.
count INTEGER Number of times that term appears in the document.
TF Output and TF_IDF Input Table Schema
Column Name Data Type Description
docid Any Document identifier.
term VARCHAR Term.
tf DOUBLE PRECISION Term frequency.
count INTEGER Number of times that term appears in the document.
TF_IDF doccount Table Schema
Column Name Data Type Description
count BIGINT Number of documents in the document set.
TF_IDF docperterm Table Schema
Column Name Data Type Description
term VARCHAR Term.
count BIGINT Number of documents that contain term.