Using ClassificationEvaluator to Evaluate the Classification | teradataml - Using ClassificationEvaluator to Evaluate the Classification - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

This example performs prediction on a model and evaluates the model, then generates the statistics for the classification.

  1. Set up the environment.
    1. Import required libraries.
      import tempfile
      import getpass
      from teradataml import DataFrame, load_example_data, create_context
    2. Create the connection to database.
      con = create_context(host=getpass.getpass("Hostname: "),
                           username=getpass.getpass("Username: "),
                           password=getpass.getpass("Password: "))
    3. Load example data and create required dataframes.
      load_example_data("textparser", ["complaints", "stop_words"]) 
      complaints = DataFrame.from_table("complaints")
      stop_words = DataFrame.from_table("stop_words")
  2. Train the model and create classifier for crash or nocrash.
    1. Check the list of available analytic functions.
      display_analytic_functions()
    2. Import functions TextParser, NaiveBayesTextClassifierTrainer, NaiveBayesTextClassifierPredict.
      from teradataml import TextParser, NaiveBayesTextClassifierTrainer, NaiveBayesTextClassifierPredict
      
    3. Tokenize the "text_column" and accumulate result by "doc_id" and "category".
      complaints_tokenized = TextParser(data=complaints,
                                        text_column="text_data",
                                        object=stop_words,
                                        remove_stopwords=True,
                                        accumulate=["doc_id", "category"])
      
    4. Calculate the conditional probabilities for token-category pairs.
      NaiveBayesTextClassifierTrainer_out = NaiveBayesTextClassifierTrainer(data=complaints_tokenized.result,
                                                                            token_column="token",
                                                                            doc_category_column="category")
      
    5. Print the result DataFrames.
      print(NaiveBayesTextClassifierTrainer_out.result)
      print(NaiveBayesTextClassifierTrainer_out.model_data)
      
    6. Score the data using NaiveBayesTextClassifierPredict() on model generated by NaiveBayesTextClassifier() where model_type is "MULTINOMIAL".
      nbt_predict_out = NaiveBayesTextClassifierPredict(object = NaiveBayesTextClassifierTrainer_out.model_data,
                                                        newdata = complaints_tokenized.result,
                                                        input_token_column = 'token',
                                                        accumulate="category",
                                                        doc_id_columns = 'doc_id')
      
    7. Print the result DataFrame.
      print(nbt_predict_out.result)
  3. Convert prediction column and category column to same DataType.
    from teradataml import ConvertTo
    predicted_data = ConvertTo(data = nbt_predict_out.result,
                               target_columns = ["category", "prediction"],
                               target_datatype = ["VARCHAR(charlen=20,charset=UNICODE,casespecific=NO)"])
  4. Evaluate classifier.
    1. Import function ClassificationEvaluator.
      from teradataml import ClassificationEvaluator
    2. Evaluate classification.
      ClassificationEvaluator_obj = ClassificationEvaluator(data=predicted_data.result,
                                                            observation_column='category',
                                                            prediction_column='prediction',
                                                            labels=['no_crash','crash'])
    3. Print the result DataFrames.
      print(ClassificationEvaluator_obj.result)
      print(ClassificationEvaluator_obj.output_data)