NaiveBayesTextClassifierTrainer2 Example | Teradata Vantage - NaiveBayesTextClassifierTrainer2 Example: IsTokenized ('true') - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
9.02
9.01
2.0
1.3
Published
February 2022
Language
English (United States)
Last Update
2022-02-10
dita:mapPath
rnn1580259159235.ditamap
dita:ditavalPath
ybt1582220416951.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

Input

  • InputTable: complaints_tokenized, created by calling TextTokenizer with the table complaints (which appears in NaiveBayesTextClassifierTrainer2 Example: IsTokenized ('false')):
    CREATE MULTISET TABLE complaints_tokenized AS (
      SELECT * FROM TextTokenizer (
        ON complaints PARTITION BY ANY
        USING
        InputLanguage ('en')
        OutputDelimiter (' ')
        OutputByWord ('true')
        Accumulate ('doc_id','category')
        TextColumn ('text_data')
      ) AS dt
    ) WITH DATA;

SQL Call

This call creates a bernoulli model table, complaints_tokens_model, by calling NaiveBayesTextClassifierTrainer2 with the table complaints_tokenized.

SELECT * FROM NaiveBayesTextClassifierTrainer2 (
  ON complaints_tokenized AS InputTable
  OUT TABLE ModelTable (complaints_tokens_model)
  USING
  TokenColumn ('token')
  IsTokenized ('TRUE')
  ModelType ('Bernoulli')
  DocIdColumns ('doc_id')
  DocCategoryColumn ('category')
) AS dt;

Output

This query returns the following table:

SELECT * FROM complaints_tokens_model ORDER BY prob desc;
 token                                 category  prob                 
 ------------------------------------- --------- -------------------- 
 NAIVE_BAYES_TEXT_MODEL_TYPE           BERNOULLI                  1.0
 .                                     no_crash    0.9411764705882353
 .                                     crash       0.8571428571428571
 NAIVE_BAYES_PRIOR_PROBABILITY         no_crash                  0.75
 and                                   crash       0.7142857142857143
 a                                     crash       0.7142857142857143
 the                                   no_crash    0.7058823529411765
 vehicle                               no_crash    0.5882352941176471
 driver's                              crash       0.5714285714285714
 approximately                         crash       0.5714285714285714
 head-on                               crash       0.5714285714285714
 not                                   crash       0.5714285714285714
 hit                                   crash       0.5714285714285714
 mph                                   crash       0.5714285714285714
 air                                   crash       0.5714285714285714
 deploy                                crash       0.5714285714285714
 passenger's                           crash       0.5714285714285714
 did                                   crash       0.5714285714285714
 side                                  crash       0.5714285714285714
 vehicle                               crash       0.5714285714285714
 and                                   no_crash    0.5294117647058824
 to                                    no_crash   0.47058823529411764
 at                                    crash      0.42857142857142855
 in                                    crash      0.42857142857142855
 another                               crash      0.42857142857142855
 on                                    crash      0.42857142857142855
 consumer                              crash      0.42857142857142855
 injuries                              crash      0.42857142857142855
 driver                                crash      0.42857142857142855
 deployed                              crash      0.42857142857142855
 into                                  crash      0.42857142857142855
 the                                   crash      0.42857142857142855
 was                                   crash      0.42857142857142855
 to                                    crash      0.42857142857142855
 bags                                  crash      0.42857142857142855
 front                                 crash      0.42857142857142855
 airbags                               crash      0.42857142857142855
 was                                   no_crash    0.4117647058823529
 on                                    no_crash   0.35294117647058826
 consumer                              no_crash   0.35294117647058826
...

Download a zip file of all examples and a SQL script file that creates their input tables.