NaiveBayesTextClassifierTrainer Example - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.10
1.1
Published
October 2019
Language
English (United States)
Last Update
2019-12-31
dita:mapPath
ima1540829771750.ditamap
dita:ditavalPath
jsj1481748799576.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

Input

  • input: Created by applying the TextTokenizer function to the training table complaints, a log of vehicle complaints

    In complaints, the category column indicates whether the car has been in a crash.

complaints
doc_id text_data category
1 consumer was driving approximately 45 mph hit a deer with the front bumper and then ran into an embankment head-on passenger's side air bag did deploy hit windshield and deployed outward. driver's side airbag cover opened but did not inflate it was still folded causing injuries. crash
2 when vehicle was involved in a crash totalling vehicle driver's side/ passenger's side air bags did not deploy. vehicle was making a left turn and was hit by a ford f350 traveling about 35 mph on the front passenger's side. driver hit his head-on the steering wheel. hurt his knee and received neck and back injuries. crash
3 consumer has experienced following problems; 1.) both lower ball joints wear out excessively; 2.) head gasket leaks; and 3.) cruise control would shut itself off while driving without foot pressing on brake pedal. no_crash
... ... ...

SQL Call

This call creates the model table, complaints_tokens_model, by calling NaiveBayesTextClassifierTrainer. It creates the NaiveBayesTextClassifierTrainer input table, token, by applying TextTokenizer to the table complaints.

CREATE MULTISET TABLE complaints_tokens_model AS (
  SELECT * FROM NaiveBayesTextClassifierTrainer (
    ON NaiveBayesTextClassifierInternal (
      ON (
        SELECT doc_id, lower(token) AS token, category FROM TextTokenizer (
          ON complaints PARTITION BY ANY
          USING
          TextColumn ('text_data')
          OutputByWord ('true')
          Accumulate ('doc_id', 'category')
        ) AS dt1
      ) AS InputTable PARTITION BY category
      USING
      TokenColumn ('token')
      ModelType ('Bernoulli')
      DocIDColumns ('doc_id')
      DocCategoryColumn ('category')
    ) AS dt2 PARTITION BY 1
  ) AS dt
) WITH DATA;

Output

This query returns the following table:

SELECT * FROM complaints_tokens_model ORDER BY prob desc;
 token                                 category  prob                 
 ------------------------------------- --------- -------------------- 
 NAIVE_BAYES_TEXT_MODEL_TYPE           BERNOULLI                  1.0
 .                                     no_crash    0.9411764705882353
 .                                     crash       0.8571428571428571
 NAIVE_BAYES_PRIOR_PROBABILITY         no_crash                  0.75
 a                                     crash       0.7142857142857143
 and                                   crash       0.7142857142857143
 the                                   no_crash    0.7058823529411765
 vehicle                               no_crash    0.5882352941176471
 driver's                              crash       0.5714285714285714
 side                                  crash       0.5714285714285714
 did                                   crash       0.5714285714285714
 deploy                                crash       0.5714285714285714
 hit                                   crash       0.5714285714285714
 head-on                               crash       0.5714285714285714
 not                                   crash       0.5714285714285714
 approximately                         crash       0.5714285714285714
 mph                                   crash       0.5714285714285714
 air                                   crash       0.5714285714285714
 vehicle                               crash       0.5714285714285714
 passenger's                           crash       0.5714285714285714
 and                                   no_crash    0.5294117647058824
 to                                    no_crash   0.47058823529411764
 airbags                               crash      0.42857142857142855
 at                                    crash      0.42857142857142855
 another                               crash      0.42857142857142855
 injuries                              crash      0.42857142857142855
 on                                    crash      0.42857142857142855
 deployed                              crash      0.42857142857142855
 into                                  crash      0.42857142857142855
 consumer                              crash      0.42857142857142855
 to                                    crash      0.42857142857142855
 was                                   crash      0.42857142857142855
 bags                                  crash      0.42857142857142855
 the                                   crash      0.42857142857142855
 driver                                crash      0.42857142857142855
 front                                 crash      0.42857142857142855
 in                                    crash      0.42857142857142855
 was                                   no_crash    0.4117647058823529
 on                                    no_crash   0.35294117647058826
 manufacturer                          no_crash   0.35294117647058826
 consumer                              no_crash   0.35294117647058826
 when                                  no_crash   0.35294117647058826
 has                                   no_crash   0.35294117647058826
 is                                    no_crash   0.35294117647058826
 a                                     no_crash   0.29411764705882354
 in                                    no_crash   0.29411764705882354
 not                                   no_crash   0.29411764705882354
 dealer                                no_crash   0.29411764705882354
 outward                               crash       0.2857142857142857
 70mph                                 crash       0.2857142857142857
 impact                                crash       0.2857142857142857
 rear                                  crash       0.2857142857142857
 neck                                  crash       0.2857142857142857
 wheel                                 crash       0.2857142857142857
 knee                                  crash       0.2857142857142857
 truck                                 crash       0.2857142857142857
 airbag                                crash       0.2857142857142857
 by                                    crash       0.2857142857142857
 about                                 crash       0.2857142857142857
 turn                                  crash       0.2857142857142857
 left                                  crash       0.2857142857142857
 forward                               crash       0.2857142857142857
 it                                    crash       0.2857142857142857
 injuries.dealer                       crash       0.2857142857142857
 involved                              crash       0.2857142857142857
 week                                  crash       0.2857142857142857
 traveling                             crash       0.2857142857142857
 idle                                  crash       0.2857142857142857
 side/                                 crash       0.2857142857142857
 car                                   crash       0.2857142857142857
 but                                   crash       0.2857142857142857
 rearended                             crash       0.2857142857142857
 incident                              crash       0.2857142857142857
 folded                                crash       0.2857142857142857
 occasions                             crash       0.2857142857142857
 with                                  crash       0.2857142857142857
 making                                crash       0.2857142857142857
 bag                                   crash       0.2857142857142857
 raced                                 crash       0.2857142857142857
 engine                                crash       0.2857142857142857
 back                                  crash       0.2857142857142857
 50                                    crash       0.2857142857142857
 80                                    crash       0.2857142857142857
 one                                   crash       0.2857142857142857
 cover                                 crash       0.2857142857142857
 slowing                               crash       0.2857142857142857
 35                                    crash       0.2857142857142857
 received                              crash       0.2857142857142857
 driving                               crash       0.2857142857142857
 park                                  crash       0.2857142857142857
 upon                                  crash       0.2857142857142857
 totalling                             crash       0.2857142857142857
 fence                                 crash       0.2857142857142857
 hurt                                  crash       0.2857142857142857
 when                                  crash       0.2857142857142857
 mphand                                crash       0.2857142857142857
 f350                                  crash       0.2857142857142857
 then                                  crash       0.2857142857142857
 his                                   crash       0.2857142857142857
 ran                                   crash       0.2857142857142857
 sustained                             crash       0.2857142857142857
 has                                   crash       0.2857142857142857
 condition                             crash       0.2857142857142857
 65                                    crash       0.2857142857142857
 dual                                  crash       0.2857142857142857
 neither                               crash       0.2857142857142857
 an                                    crash       0.2857142857142857
 or                                    crash       0.2857142857142857
 deer                                  crash       0.2857142857142857
 for                                   crash       0.2857142857142857
 45                                    crash       0.2857142857142857
 embankment                            crash       0.2857142857142857
 still                                 crash       0.2857142857142857
 why                                   crash       0.2857142857142857
 lurched                               crash       0.2857142857142857
 prior                                 crash       0.2857142857142857
 two                                   crash       0.2857142857142857
 building                              crash       0.2857142857142857
 windshield                            crash       0.2857142857142857
 ended                                 crash       0.2857142857142857
 bumper                                crash       0.2857142857142857
 steering                              crash       0.2857142857142857
 had                                   crash       0.2857142857142857
 ford                                  crash       0.2857142857142857
 determine                             crash       0.2857142857142857
 shop                                  crash       0.2857142857142857
 high                                  crash       0.2857142857142857
 opened                                crash       0.2857142857142857
 crash                                 crash       0.2857142857142857
 inflate                               crash       0.2857142857142857
 causing                               crash       0.2857142857142857
 dealer                                crash       0.2857142857142857
 while                                 crash       0.2857142857142857
 been                                  crash       0.2857142857142857
 crashed                               crash       0.2857142857142857
 NAIVE_BAYES_PRIOR_PROBABILITY         crash                     0.25
 would                                 no_crash   0.23529411764705882
 by                                    no_crash   0.23529411764705882
 at                                    no_crash   0.23529411764705882
 also                                  no_crash   0.23529411764705882
 of                                    no_crash   0.23529411764705882
 work                                  no_crash   0.23529411764705882
 replaced                              no_crash   0.23529411764705882
 recall                                no_crash   0.23529411764705882
 problem                               no_crash   0.23529411764705882
 been                                  no_crash   0.23529411764705882
 had                                   no_crash   0.23529411764705882
 causing                               no_crash   0.23529411764705882
 will                                  no_crash   0.23529411764705882
 this                                  no_crash   0.17647058823529413
 defect                                no_crash   0.17647058823529413
 out                                   no_crash   0.17647058823529413
 wheel                                 no_crash   0.17647058823529413
 left                                  no_crash   0.17647058823529413
 repaired                              no_crash   0.17647058823529413
 engine                                no_crash   0.17647058823529413
 wipers                                no_crash   0.17647058823529413
 broke                                 no_crash   0.17647058823529413
 &                                     no_crash   0.17647058823529413
 times                                 no_crash   0.17647058823529413
 be                                    no_crash   0.17647058823529413
 shut                                  no_crash   0.17647058823529413
 driving                               no_crash   0.17647058823529413
 off                                   no_crash   0.17647058823529413
 under                                 no_crash   0.17647058823529413
 have                                  no_crash   0.17647058823529413
 miles                                 no_crash   0.17647058823529413
 switch                                no_crash   0.17647058823529413
 after                                 no_crash   0.17647058823529413
 from                                  no_crash   0.17647058823529413
 that                                  no_crash   0.17647058823529413
 an                                    no_crash   0.17647058823529413
 determine                             no_crash   0.17647058823529413
 3                                     no_crash   0.17647058823529413
 informed                              no_crash   0.17647058823529413
 notified                              no_crash   0.17647058823529413
 control                               no_crash   0.17647058823529413
 owner                                 no_crash   0.17647058823529413
 which                                 no_crash   0.17647058823529413
 ignition                              no_crash   0.17647058823529413
 windshield                            no_crash   0.17647058823529413
 down                                  no_crash   0.17647058823529413
 transmission                          no_crash   0.17647058823529413
 while                                 no_crash   0.17647058823529413
 up                                    no_crash   0.17647058823529413
 front                                 no_crash   0.17647058823529413
 still                                 no_crash   0.17647058823529413
 NAIVE_BAYES_MISSING_TOKEN_PROBABILITY crash      0.14285714285714285
 four                                  no_crash   0.11764705882352941
 4                                     no_crash   0.11764705882352941
 it                                    no_crash   0.11764705882352941
 cruise                                no_crash   0.11764705882352941
 brake's                               no_crash   0.11764705882352941
 increasedit                           no_crash   0.11764705882352941
 jiggle                                no_crash   0.11764705882352941
 around                                no_crash   0.11764705882352941
 2                                     no_crash   0.11764705882352941
 own                                   no_crash   0.11764705882352941
 stuck                                 no_crash   0.11764705882352941
 recker                                no_crash   0.11764705882352941
 sunroof                               no_crash   0.11764705882352941
 rear                                  no_crash   0.11764705882352941
 turned                                no_crash   0.11764705882352941
 dealership                            no_crash   0.11764705882352941
 transfer                              no_crash   0.11764705882352941
 hitting                               no_crash   0.11764705882352941
 notfied                               no_crash   0.11764705882352941
 resulted                              no_crash   0.11764705882352941
 hill                                  no_crash   0.11764705882352941
 owners                                no_crash   0.11764705882352941
 speeds                                no_crash   0.11764705882352941
 10mph                                 no_crash   0.11764705882352941
 incline                               no_crash   0.11764705882352941
 speed                                 no_crash   0.11764705882352941
 change                                no_crash   0.11764705882352941
 speedometer                           no_crash   0.11764705882352941
 stalled                               no_crash   0.11764705882352941
 turn                                  no_crash   0.11764705882352941
 repairs                               no_crash   0.11764705882352941
 airbag                                no_crash   0.11764705882352941
 housing                               no_crash   0.11764705882352941
 provide                               no_crash   0.11764705882352941
 thousand                              no_crash   0.11764705882352941
 belts/speed                           no_crash   0.11764705882352941
 problems                              no_crash   0.11764705882352941
 referenced                            no_crash   0.11764705882352941
 coming                                no_crash   0.11764705882352941
 almost                                no_crash   0.11764705882352941
 experienced                           no_crash   0.11764705882352941
 module                                no_crash   0.11764705882352941
 about                                 no_crash   0.11764705882352941
 intermittently                        no_crash   0.11764705882352941
 happened                              no_crash   0.11764705882352941
 inoperative                           no_crash   0.11764705882352941
 truck                                 no_crash   0.11764705882352941
 electrical                            no_crash   0.11764705882352941
 case                                  no_crash   0.11764705882352941
 traveling                             no_crash   0.11764705882352941
 wants                                 no_crash   0.11764705882352941
 1998                                  no_crash   0.11764705882352941
 what                                  no_crash   0.11764705882352941
 hour                                  no_crash   0.11764705882352941
 please                                no_crash   0.11764705882352941
 but                                   no_crash   0.11764705882352941
 expense                               no_crash   0.11764705882352941
 occurring                             no_crash   0.11764705882352941
 slowing                               no_crash   0.11764705882352941
 total                                 no_crash   0.11764705882352941
 over                                  no_crash   0.11764705882352941
 slip                                  no_crash   0.11764705882352941
 unexpectedly                          no_crash   0.11764705882352941
 completed                             no_crash   0.11764705882352941
 saw                                   no_crash   0.11764705882352941
 does                                  no_crash   0.11764705882352941
 alternator/                           no_crash   0.11764705882352941
 storm                                 no_crash   0.11764705882352941
 made                                  no_crash   0.11764705882352941
 without                               no_crash   0.11764705882352941
 rpms                                  no_crash   0.11764705882352941
 started                               no_crash   0.11764705882352941
 information                           no_crash   0.11764705882352941
 side                                  no_crash   0.11764705882352941
 if                                    no_crash   0.11764705882352941
 controlcable                          no_crash   0.11764705882352941
 head                                  no_crash   0.11764705882352941
 heard                                 no_crash   0.11764705882352941
 pull                                  no_crash   0.11764705882352941
 )                                     no_crash   0.11764705882352941
 68000                                 no_crash   0.11764705882352941
 stopped                               no_crash   0.11764705882352941
 pedal                                 no_crash   0.11764705882352941
 99v029000                             no_crash   0.11764705882352941
 pump                                  no_crash   0.11764705882352941
 burned                                no_crash   0.11764705882352941
 joints                                no_crash   0.11764705882352941
 corrected                             no_crash   0.11764705882352941
 walnut                                no_crash   0.11764705882352941
 lower                                 no_crash   0.11764705882352941
 r&r                                   no_crash   0.11764705882352941
 accurate                              no_crash   0.11764705882352941
 ea02-025                              no_crash   0.11764705882352941
 back                                  no_crash   0.11764705882352941
 themselves                            no_crash   0.11764705882352941
 ball                                  no_crash   0.11764705882352941
 gear                                  no_crash   0.11764705882352941
 yh                                    no_crash   0.11764705882352941
 become                                no_crash   0.11764705882352941
 properly                              no_crash   0.11764705882352941
 can't                                 no_crash   0.11764705882352941
 defective                             no_crash   0.11764705882352941
 do                                    no_crash   0.11764705882352941
 factory                               no_crash   0.11764705882352941
 referred                              no_crash   0.11764705882352941
 then                                  no_crash   0.11764705882352941
 following                             no_crash   0.11764705882352941
 parked                                no_crash   0.11764705882352941
 pressing                              no_crash   0.11764705882352941
 malfunctioned                         no_crash   0.11764705882352941
 rain                                  no_crash   0.11764705882352941
 shortening                            no_crash   0.11764705882352941
 blew                                  no_crash   0.11764705882352941
 stall                                 no_crash   0.11764705882352941
 further                               no_crash   0.11764705882352941
 took                                  no_crash   0.11764705882352941
 it's                                  no_crash   0.11764705882352941
 drive                                 no_crash   0.11764705882352941
 noise                                 no_crash   0.11764705882352941
 light                                 no_crash   0.11764705882352941
 brake                                 no_crash   0.11764705882352941
 manufactured                          no_crash   0.11764705882352941
 smoke                                 no_crash   0.11764705882352941
 battery                               no_crash   0.11764705882352941
 reimbursement                         no_crash   0.11764705882352941
 ;                                     no_crash   0.11764705882352941
 motor                                 no_crash   0.11764705882352941
 were                                  no_crash   0.11764705882352941
 performed                             no_crash   0.11764705882352941
 both                                  no_crash   0.11764705882352941
 into                                  no_crash   0.11764705882352941
 keep                                  no_crash   0.11764705882352941
 sitting                               no_crash   0.11764705882352941
 move                                  no_crash   0.11764705882352941
 leaking                               no_crash   0.11764705882352941
 totally                               no_crash   0.11764705882352941
 rolled                                no_crash   0.11764705882352941
 tune                                  no_crash   0.11764705882352941
 stayed                                no_crash   0.11764705882352941
 could                                 no_crash   0.11764705882352941
 leaks                                 no_crash   0.11764705882352941
 power                                 no_crash   0.11764705882352941
 coil                                  no_crash   0.11764705882352941
 bearing                               no_crash   0.11764705882352941
 97v017000                             no_crash   0.11764705882352941
 caused                                no_crash   0.11764705882352941
 driveshaft                            no_crash   0.11764705882352941
 itself                                no_crash   0.11764705882352941
 its                                   no_crash   0.11764705882352941
 off/on                                no_crash   0.11764705882352941
 separated                             no_crash   0.11764705882352941
 for                                   no_crash   0.11764705882352941
 gasket                                no_crash   0.11764705882352941
 just                                  no_crash   0.11764705882352941
 drivers                               no_crash   0.11764705882352941
 excessively                           no_crash   0.11764705882352941
 due                                   no_crash   0.11764705882352941
 cable                                 no_crash   0.11764705882352941
 owner's                               no_crash   0.11764705882352941
 first                                 no_crash   0.11764705882352941
 *ml                                   no_crash   0.11764705882352941
 mechanic                              no_crash   0.11764705882352941
 frame                                 no_crash   0.11764705882352941
 starter                               no_crash   0.11764705882352941
 start                                 no_crash   0.11764705882352941
 fire                                  no_crash   0.11764705882352941
 periodcally                           no_crash   0.11764705882352941
 reinspected                           no_crash   0.11764705882352941
 cannot                                no_crash   0.11764705882352941
 compartment                           no_crash   0.11764705882352941
 shift                                 no_crash   0.11764705882352941
 owned                                 no_crash   0.11764705882352941
 crash                                 no_crash   0.11764705882352941
 smelled                               no_crash   0.11764705882352941
 wear                                  no_crash   0.11764705882352941
 checked                               no_crash   0.11764705882352941
 foot                                  no_crash   0.11764705882352941
 son                                   no_crash   0.11764705882352941
 fail                                  no_crash   0.11764705882352941
 aware                                 no_crash   0.11764705882352941
 loss                                  no_crash   0.11764705882352941
 1                                     no_crash   0.11764705882352941
 steering                              no_crash   0.11764705882352941
 66900                                 no_crash   0.11764705882352941
 NAIVE_BAYES_MISSING_TOKEN_PROBABILITY no_crash  0.058823529411764705

Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.