Teradata Python Package Function Reference - 16.20 - TextClassifier - Teradata Python Package

Teradata® Python Package Function Reference

prodname
Teradata Python Package
vrm_release
16.20
created_date
February 2020
category
Programming Reference
featnum
B700-4008-098K

 
teradataml.analytics.mle.TextClassifier = class TextClassifier(builtins.object)
     Methods defined here:
__init__(self, model_file=None, newdata=None, text_column=None, accumulate=None, newdata_sequence_column=None, newdata_order_column=None)
DESCRIPTION:
    The TextClassifier function classifies input text, using a model
    output by the function TextClassifierTrainer.
 
 
PARAMETERS:
    model_file:
        Required Argument.
        Specifies the model installed in the database using the
        TextClassifierTrainer function.
        Types: str
 
    newdata:
        Required Argument.
        Specifies the teradataml DataFrame that contains the text to be
        classified.
 
    newdata_order_column:
        Required Argument.
        Specifies Order By columns for newdata.
        Values to this argument can be provided as list, if multiple
        columns are used for ordering.
        Types: str OR list of Strings (str)
 
    text_column:
        Required Argument.
        Specifies the column of the input teradataml DataFrame that
        contains the text to be used for predicting classification.
        Types: str
 
    accumulate:
        Optional Argument.
        Specifies the names of the input columns to copy to the output
        teradataml DataFrame.
        Types: str OR list of Strings (str)
 
    newdata_sequence_column:
        Optional Argument.
        Specifies the list of column(s) that uniquely identifies each
        row of the input argument "newdata". The argument is used to
        ensure deterministic results for functions which produce results
        that vary from run to run.
        Types: str OR list of Strings (str)
 
RETURNS:
    Instance of TextClassifier.
    Output teradataml DataFrames can be accessed using attribute
    references, such as TextClassifierObj.<attribute_name>.
    Output teradataml DataFrame attribute name is:
        result
 
 
RAISES:
    TeradataMlException
 
 
EXAMPLES:
    # Load example data.
    load_example_data("textclassifiertrainer", "texttrainer_input")
    load_example_data("textclassifier", "textclassifier_input")
 
    # Create teradataml DataFrame objects.
    # The input table "texttrainer_input" contains text of the training
    # documents and the category of the training documents.
    texttrainer_input = DataFrame.from_table("texttrainer_input")
 
    # The input table "textclassifier_input" contains the text to be
    # classified.
    textclassifier_input = DataFrame.from_table("textclassifier_input")
 
    # Generate model file using TextClassifierTrainer function.
    textclassifiertrainer_out = TextClassifierTrainer(data=texttrainer_input,
                                                      text_column='content',
                                                      category_column='category',
                                                      classifier_type='knn',
                                                      model_file='knn.bin',
                                                      nlp_parameters=['useStem:true','stopwordsFile:stopwords.txt'],
                                                      classifier_parameters='compress:0.9',
                                                      feature_selection='DF:[0.1:0.99]',
                                                      data_sequence_column='id'
                                                      )
 
    # Example 1 - This example uses model_file "knn.bin" generated by
    # TextClassifierTrainer function to classify the input text.
    TextClassifier_out = TextClassifier(newdata = textclassifier_input,
                                        model_file = "knn.bin",
                                        text_column = "content",
                                        accumulate = ["id","category"],
                                        newdata_order_column = "id"
                                        )
 
    # Print the result teradataml DataFrame
    print(TextClassifier_out)
__repr__(self)
Returns the string representation for a TextClassifier class instance.