Teradata Package for Python Function Reference | 17.10 - NaiveBayesTextClassifier - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

teradataml.analytics.mle.NaiveBayesTextClassifier = class NaiveBayesTextClassifier(builtins.object)

Methods defined here:

__init__(self, data=None, token_column=None, doc_id_columns=None, doc_category_column=None, model_type='MULTINOMIAL', categories=None, category_column='[0:0]', prediction_categories=None, stopwords=None, stopwords_column=None, stopwords_list=None, data_sequence_column=None, stopwords_sequence_column=None, categories_sequence_column=None, data_partition_column=None, data_order_column=None, stopwords_order_column=None, categories_order_column=None): DESCRIPTION: The NaiveBayesTextClassifierTrainer function takes training data as input and outputs a model table. PARAMETERS: data: Required Argument. The teradataml DataFrame defining the training tokens. data_partition_column: Required Argument. Specifies Partition By columns for data. Values to this argument can be provided as list, if multiple columns are used for ordering. Types: str OR list of Strings (str) data_order_column: Optional Argument. Specifies Order By columns for data. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) token_column: Required Argument. Specifies the name of the token_table column that contains the tokens to be classified. Types: str doc_id_columns: Optional Argument. Required when "model_type" argument is 'BERNOULLI'. Specifies the names of the token_table columns that contain the document identifier. Types: str OR list of Strings (str) Note: This argument should not be provided when "model_type" is 'MULTINOMIAL'. doc_category_column: Required Argument. Specifies the name of the token_table column that contains the document category. Types: str model_type: Optional Argument. Specifies the model type of the text classifier. The formulas for the two model types follow this table. Default Value: "MULTINOMIAL" Permitted Values: MULTINOMIAL, BERNOULLI Types: str categories: Optional Argument. The teradataml DataFrame defining allowed categories. categories_order_column: Optional Argument. Specifies Order By columns for categories. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) category_column: Optional Argument. Specifies the name of the categories_table column that contains the prediction categories. The default value is the first column of categories_table. Default Value: "[0:0]" Types: str prediction_categories: Optional Argument. Specifies the prediction categories. Note: Specify either this argument or the categories_table, but not both. Types: str OR list of Strings (str) stopwords: Optional Argument. The teradataml DataFrame defining stop words. stopwords_order_column: Optional Argument. Specifies Order By columns for stopwords. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) stopwords_column: Optional Argument. Specifies the name of the stop_words_table column that contains the stop words. The default value is the first column of stop_words_table. Types: str stopwords_list: Optional Argument. Specifies words to ignore (such as a, an, and the). Note: Specify either this argument or the stop_words_table, but not both. Types: str OR list of Strings (str) data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) stopwords_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "stopwords". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) categories_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "categories". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) RETURNS: Instance of NaiveBayesTextClassifier. Output teradataml DataFrames can be accessed using attribute references, such as NaiveBayesTextClassifierObj.<attribute_name>. Output teradataml DataFrame attribute name is: result RAISES: TeradataMlException EXAMPLES: # Load the data to run the example load_example_data("NaiveBayesTextClassifier","token_table") # Create teradataml DataFrame token_table = DataFrame.from_table("token_table") # Example 1 - nbt_result = NaiveBayesTextClassifier(data = token_table, token_column = 'token', doc_id_columns = 'doc_id', doc_category_column = 'category', model_type = "BERNOULLI", data_partition_column = 'category') # Print the result DataFrame print(nbt_result.result)

__repr__(self): Returns the string representation for a NaiveBayesTextClassifier class instance.

get_build_time(self): Function to return the build time of the algorithm in seconds. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_prediction_type(self): Function to return the Prediction type of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_target_column(self): Function to return the Target Column of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

show_query(self): Function to return the underlying SQL query. When model object is created using retrieve_model(), then None is returned.