Using ClassificationEvaluator to Evaluate the Classification | teradataml - Using ClassificationEvaluator to Evaluate the Classification

Using ClassificationEvaluator to Evaluate the Classification | teradataml - Using ClassificationEvaluator to Evaluate the Classification - Teradata Vantage

Teradata® VantageCloud Lake

Deployment

VantageCloud

Edition

Lake

Product

Teradata Vantage

Published

January 2023

Language

English (United States)

Last Update

2024-04-03

dita:mapPath

phg1621910019905.ditamap

dita:ditavalPath

pny1626732985837.ditaval

dita:id

phg1621910019905

This example performs prediction on a model and evaluates the model, then generates the statistics for the classification.

Set up the environment.

Import required libraries.

import tempfile

import getpass

from teradataml import DataFrame, load_example_data, create_context

Create the connection to database.

con = create_context(host=getpass.getpass("Hostname: "),
                     username=getpass.getpass("Username: "),
                     password=getpass.getpass("Password: "))

Load example data and create required dataframes.

load_example_data("textparser", ["complaints", "stop_words"])

complaints = DataFrame.from_table("complaints")

stop_words = DataFrame.from_table("stop_words")

Train the model and create classifier for crash or nocrash.

Check the list of available analytic functions.
```
display_analytic_functions()
```

Import functions TextParser, NaiveBayesTextClassifierTrainer, NaiveBayesTextClassifierPredict.

from teradataml import TextParser, NaiveBayesTextClassifierTrainer, NaiveBayesTextClassifierPredict

Tokenize the "text_column" and accumulate result by "doc_id" and "category".

complaints_tokenized = TextParser(data=complaints,
                                  text_column="text_data",
                                  object=stop_words,
                                  remove_stopwords=True,
                                  accumulate=["doc_id", "category"])

Calculate the conditional probabilities for token-category pairs.

NaiveBayesTextClassifierTrainer_out = NaiveBayesTextClassifierTrainer(data=complaints_tokenized.result,
                                                                      token_column="token",
                                                                      doc_category_column="category")

Print the result DataFrames.

print(NaiveBayesTextClassifierTrainer_out.result)
print(NaiveBayesTextClassifierTrainer_out.model_data)

Score the data using NaiveBayesTextClassifierPredict() on model generated by NaiveBayesTextClassifier() where model_type is "MULTINOMIAL".

nbt_predict_out = NaiveBayesTextClassifierPredict(object = NaiveBayesTextClassifierTrainer_out.model_data,
                                                  newdata = complaints_tokenized.result,
                                                  input_token_column = 'token',
                                                  accumulate="category",
                                                  doc_id_columns = 'doc_id')

Print the result DataFrame.
```
print(nbt_predict_out.result)
```

Convert prediction column and category column to same DataType.

from teradataml import ConvertTo

predicted_data = ConvertTo(data = nbt_predict_out.result,
                           target_columns = ["category", "prediction"],
                           target_datatype = ["VARCHAR(charlen=20,charset=UNICODE,casespecific=NO)"])

Evaluate classifier.

Import function ClassificationEvaluator.

from teradataml import ClassificationEvaluator

Evaluate classification.

ClassificationEvaluator_obj = ClassificationEvaluator(data=predicted_data.result,
                                                      observation_column='category',
                                                      prediction_column='prediction',
                                                      labels=['no_crash','crash'])

Print the result DataFrames.

print(ClassificationEvaluator_obj.result)
print(ClassificationEvaluator_obj.output_data)