Input
- InputTable: complaints, a log of vehicle complaints
In complaints, the category column indicates whether the car has been in a crash.
doc_id | text_data | category |
---|---|---|
1 | consumer was driving approximately 45 mph hit a deer with the front bumper and then ran into an embankment head-on passenger's side air bag did deploy hit windshield and deployed outward. driver's side airbag cover opened but did not inflate it was still folded causing injuries. | crash |
2 | when vehicle was involved in a crash totalling vehicle driver's side/ passenger's side air bags did not deploy. vehicle was making a left turn and was hit by a ford f350 traveling about 35 mph on the front passenger's side. driver hit his head-on the steering wheel. hurt his knee and received neck and back injuries. | crash |
3 | consumer has experienced following problems; 1.) both lower ball joints wear out excessively; 2.) head gasket leaks; and 3.) cruise control would shut itself off while driving without foot pressing on brake pedal. | no_crash |
... | ... | ... |
SQL Call
This call creates a bernoulli model table, complaints_tokens_model, by calling NaiveBayesTextClassifierTrainer2 with the input table complaints. The function tokenizes the data internally.
SELECT * FROM NaiveBayesTextClassifierTrainer2 ( ON complaints AS InputTable OUT TABLE ModelTable (complaints_tokens_model) USING TextColumn ('text_data') IsTokenized ('FALSE') ModelType ('Bernoulli') DocIdColumns ('doc_id') DocCategoryColumn ('category') ) AS dt;
Output
This query returns the following table:
SELECT * FROM complaints_tokens_model ORDER BY prob desc;
token category prob ------------------------------------- --------- -------------------- NAIVE_BAYES_TEXT_MODEL_TYPE BERNOULLI 1.0 NAIVE_BAYES_PRIOR_PROBABILITY no_crash 0.75 deploy crash 0.7142857142857143 a crash 0.7142857142857143 and crash 0.7142857142857143 driver crash 0.7142857142857143 the no_crash 0.7058823529411765 vehicl no_crash 0.5882352941176471 passeng crash 0.5714285714285714 airbag crash 0.5714285714285714 approxim crash 0.5714285714285714 head-on crash 0.5714285714285714 bag crash 0.5714285714285714 did crash 0.5714285714285714 vehicl crash 0.5714285714285714 air crash 0.5714285714285714 mph crash 0.5714285714285714 not crash 0.5714285714285714 side crash 0.5714285714285714 hit crash 0.5714285714285714 and no_crash 0.5294117647058824 to no_crash 0.47058823529411764 into crash 0.42857142857142855 to crash 0.42857142857142855 anoth crash 0.42857142857142855 in crash 0.42857142857142855 on crash 0.42857142857142855 the crash 0.42857142857142855 at crash 0.42857142857142855 crash crash 0.42857142857142855 injuri crash 0.42857142857142855 consum crash 0.42857142857142855 front crash 0.42857142857142855 was crash 0.42857142857142855 was no_crash 0.4117647058823529 has no_crash 0.35294117647058826 is no_crash 0.35294117647058826 when no_crash 0.35294117647058826 on no_crash 0.35294117647058826 consum no_crash 0.35294117647058826 ...
Download a zip file of all examples and a SQL script file that creates their input tables.