| |
Methods defined here:
- __init__(self, data=None, token_column=None, doc_id_columns=None, doc_category_column=None, model_type='MULTINOMIAL', categories=None, category_column='[0:0]', prediction_categories=None, stopwords=None, stopwords_column=None, stopwords_list=None, data_sequence_column=None, stopwords_sequence_column=None, categories_sequence_column=None, data_partition_column=None, data_order_column=None, stopwords_order_column=None, categories_order_column=None)
- DESCRIPTION:
The NaiveBayesTextClassifierTrainer function takes training data as
input and outputs a model table.
PARAMETERS:
data:
Required Argument.
The teradataml DataFrame defining the training tokens.
data_partition_column:
Required Argument.
Specifies Partition By columns for data.
Values to this argument can be provided as list, if multiple columns
are used for ordering.
Types: str OR list of Strings (str)
data_order_column:
Optional Argument.
Specifies Order By columns for data.
Values to this argument can be provided as a list, if multiple
columns are used for ordering.
Types: str OR list of Strings (str)
token_column:
Required Argument.
Specifies the name of the token_table column that contains the tokens
to be classified.
Types: str
doc_id_columns:
Optional Argument. Required when "model_type" argument is 'BERNOULLI'.
Specifies the names of the token_table columns that contain the
document identifier.
Types: str OR list of Strings (str)
Note:
This argument should not be provided when "model_type" is 'MULTINOMIAL'.
doc_category_column:
Required Argument.
Specifies the name of the token_table column that contains the
document category.
Types: str
model_type:
Optional Argument.
Specifies the model type of the text classifier. The formulas for the
two model types follow this table.
Default Value: "MULTINOMIAL"
Permitted Values: MULTINOMIAL, BERNOULLI
Types: str
categories:
Optional Argument.
The teradataml DataFrame defining allowed categories.
categories_order_column:
Optional Argument.
Specifies Order By columns for categories.
Values to this argument can be provided as a list, if multiple
columns are used for ordering.
Types: str OR list of Strings (str)
category_column:
Optional Argument.
Specifies the name of the categories_table column that contains the
prediction categories. The default value is the first column of
categories_table.
Default Value: "[0:0]"
Types: str
prediction_categories:
Optional Argument.
Specifies the prediction categories.
Note: Specify either this argument or the categories_table, but not both.
Types: str OR list of Strings (str)
stopwords:
Optional Argument.
The teradataml DataFrame defining stop words.
stopwords_order_column:
Optional Argument.
Specifies Order By columns for stopwords.
Values to this argument can be provided as a list, if multiple
columns are used for ordering.
Types: str OR list of Strings (str)
stopwords_column:
Optional Argument.
Specifies the name of the stop_words_table column that contains the
stop words. The default value is the first column of stop_words_table.
Types: str
stopwords_list:
Optional Argument.
Specifies words to ignore (such as a, an, and the).
Note: Specify either this argument or the stop_words_table, but not both.
Types: str OR list of Strings (str)
data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
stopwords_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "stopwords". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
categories_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "categories". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
RETURNS:
Instance of NaiveBayesTextClassifier.
Output teradataml DataFrames can be accessed using attribute
references, such as NaiveBayesTextClassifierObj.<attribute_name>.
Output teradataml DataFrame attribute name is:
result
RAISES:
TeradataMlException
EXAMPLES:
# Load the data to run the example
load_example_data("NaiveBayesTextClassifier","token_table")
# Create teradataml DataFrame
token_table = DataFrame.from_table("token_table")
# Example 1 -
nbt_result = NaiveBayesTextClassifier(data = token_table,
token_column = 'token',
doc_id_columns = 'doc_id',
doc_category_column = 'category',
model_type = "BERNOULLI",
data_partition_column = 'category')
# Print the result DataFrame
print(nbt_result.result)
- __repr__(self)
- Returns the string representation for a NaiveBayesTextClassifier class instance.
- get_build_time(self)
- Function to return the build time of the algorithm in seconds.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_prediction_type(self)
- Function to return the Prediction type of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_target_column(self)
- Function to return the Target Column of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- show_query(self)
- Function to return the underlying SQL query.
When model object is created using retrieve_model(), then None is returned.
|