This example performs prediction on a model and evaluates the model, then generates the statistics for the classification.
- Set up the environment.
- Import required libraries.
import tempfile
import getpass
from teradataml import DataFrame, load_example_data, create_context
- Create the connection to database.
con = create_context(host=getpass.getpass("Hostname: "), username=getpass.getpass("Username: "), password=getpass.getpass("Password: "))
- Load example data and create required dataframes.
load_example_data("textparser", ["complaints", "stop_words"])
complaints = DataFrame.from_table("complaints")
stop_words = DataFrame.from_table("stop_words")
- Import required libraries.
- Train the model and create classifier for crash or nocrash.
- Check the list of available analytic functions.
display_analytic_functions()
- Import functions TextParser, NaiveBayesTextClassifierTrainer, NaiveBayesTextClassifierPredict.
from teradataml import TextParser, NaiveBayesTextClassifierTrainer, NaiveBayesTextClassifierPredict
- Tokenize the "text_column" and accumulate result by "doc_id" and "category".
complaints_tokenized = TextParser(data=complaints, text_column="text_data", object=stop_words, remove_stopwords=True, accumulate=["doc_id", "category"])
- Calculate the conditional probabilities for token-category pairs.
NaiveBayesTextClassifierTrainer_out = NaiveBayesTextClassifierTrainer(data=complaints_tokenized.result, token_column="token", doc_category_column="category")
- Print the result DataFrames.
print(NaiveBayesTextClassifierTrainer_out.result) print(NaiveBayesTextClassifierTrainer_out.model_data)
- Score the data using NaiveBayesTextClassifierPredict() on model generated by NaiveBayesTextClassifier() where model_type is "MULTINOMIAL".
nbt_predict_out = NaiveBayesTextClassifierPredict(object = NaiveBayesTextClassifierTrainer_out.model_data, newdata = complaints_tokenized.result, input_token_column = 'token', accumulate="category", doc_id_columns = 'doc_id')
- Print the result DataFrame.
print(nbt_predict_out.result)
- Check the list of available analytic functions.
- Convert prediction column and category column to same DataType.
from teradataml import ConvertTo
predicted_data = ConvertTo(data = nbt_predict_out.result, target_columns = ["category", "prediction"], target_datatype = ["VARCHAR(charlen=20,charset=UNICODE,casespecific=NO)"])
- Evaluate classifier.
- Import function ClassificationEvaluator.
from teradataml import ClassificationEvaluator
- Evaluate classification.
ClassificationEvaluator_obj = ClassificationEvaluator(data=predicted_data.result, observation_column='category', prediction_column='prediction', labels=['no_crash','crash'])
- Print the result DataFrames.
print(ClassificationEvaluator_obj.result) print(ClassificationEvaluator_obj.output_data)
- Import function ClassificationEvaluator.