NaiveBayesTextClassifierTrainer2 Example | Teradata Vantage - NaiveBayesTextClassifierTrainer2 Example: IsTokenized ('false') - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
9.02
9.01
2.0
1.3
Published
February 2022
Language
English (United States)
Last Update
2022-02-10
dita:mapPath
rnn1580259159235.ditamap
dita:ditavalPath
ybt1582220416951.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

Input

  • InputTable: complaints, a log of vehicle complaints

    In complaints, the category column indicates whether the car has been in a crash.

complaints
doc_id text_data category
1 consumer was driving approximately 45 mph hit a deer with the front bumper and then ran into an embankment head-on passenger's side air bag did deploy hit windshield and deployed outward. driver's side airbag cover opened but did not inflate it was still folded causing injuries. crash
2 when vehicle was involved in a crash totalling vehicle driver's side/ passenger's side air bags did not deploy. vehicle was making a left turn and was hit by a ford f350 traveling about 35 mph on the front passenger's side. driver hit his head-on the steering wheel. hurt his knee and received neck and back injuries. crash
3 consumer has experienced following problems; 1.) both lower ball joints wear out excessively; 2.) head gasket leaks; and 3.) cruise control would shut itself off while driving without foot pressing on brake pedal. no_crash
... ... ...

SQL Call

This call creates a bernoulli model table, complaints_tokens_model, by calling NaiveBayesTextClassifierTrainer2 with the input table complaints. The function tokenizes the data internally.

SELECT * FROM NaiveBayesTextClassifierTrainer2 (
  ON complaints AS InputTable
  OUT TABLE ModelTable (complaints_tokens_model)
  USING
  TextColumn ('text_data')
  IsTokenized ('FALSE')
  ModelType ('Bernoulli')
  DocIdColumns ('doc_id')
  DocCategoryColumn ('category')
) AS dt;

Output

This query returns the following table:

SELECT * FROM complaints_tokens_model ORDER BY prob desc;
 token                                 category  prob                 
 ------------------------------------- --------- -------------------- 
 NAIVE_BAYES_TEXT_MODEL_TYPE           BERNOULLI                  1.0
 NAIVE_BAYES_PRIOR_PROBABILITY         no_crash                  0.75
 deploy                                crash       0.7142857142857143
 a                                     crash       0.7142857142857143
 and                                   crash       0.7142857142857143
 driver                                crash       0.7142857142857143
 the                                   no_crash    0.7058823529411765
 vehicl                                no_crash    0.5882352941176471
 passeng                               crash       0.5714285714285714
 airbag                                crash       0.5714285714285714
 approxim                              crash       0.5714285714285714
 head-on                               crash       0.5714285714285714
 bag                                   crash       0.5714285714285714
 did                                   crash       0.5714285714285714
 vehicl                                crash       0.5714285714285714
 air                                   crash       0.5714285714285714
 mph                                   crash       0.5714285714285714
 not                                   crash       0.5714285714285714
 side                                  crash       0.5714285714285714
 hit                                   crash       0.5714285714285714
 and                                   no_crash    0.5294117647058824
 to                                    no_crash   0.47058823529411764
 into                                  crash      0.42857142857142855
 to                                    crash      0.42857142857142855
 anoth                                 crash      0.42857142857142855
 in                                    crash      0.42857142857142855
 on                                    crash      0.42857142857142855
 the                                   crash      0.42857142857142855
 at                                    crash      0.42857142857142855
 crash                                 crash      0.42857142857142855
 injuri                                crash      0.42857142857142855
 consum                                crash      0.42857142857142855
 front                                 crash      0.42857142857142855
 was                                   crash      0.42857142857142855
 was                                   no_crash    0.4117647058823529
 has                                   no_crash   0.35294117647058826
 is                                    no_crash   0.35294117647058826
 when                                  no_crash   0.35294117647058826
 on                                    no_crash   0.35294117647058826
 consum                                no_crash   0.35294117647058826
...

Download a zip file of all examples and a SQL script file that creates their input tables.