Input
- tokens: Created by applying the TextTokenizer function to the training table complaints, a log of vehicle complaints
In complaints, the category column indicates whether the car has been in a crash.
doc_id | text_data | category |
---|---|---|
1 | consumer was driving approximately 45 mph hit a deer with the front bumper and then ran into an embankment head-on passenger's side air bag did deploy hit windshield and deployed outward. driver's side airbag cover opened but did not inflate it was still folded causing injuries. | crash |
2 | when vehicle was involved in a crash totalling vehicle driver's side/ passenger's side air bags did not deploy. vehicle was making a left turn and was hit by a ford f350 traveling about 35 mph on the front passenger's side. driver hit his head-on the steering wheel. hurt his knee and received neck and back injuries. | crash |
3 | consumer has experienced following problems; 1.) both lower ball joints wear out excessively; 2.) head gasket leaks; and 3.) cruise control would shut itself off while driving without foot pressing on brake pedal. | no_crash |
... | ... | ... |
SQL Call
This call creates the model table, complaints_tokens_model, by calling NaiveBayesTextClassifierTrainer. It creates the NaiveBayesTextClassifierTrainer input table, token, by applying TextTokenizer to the table complaints.
CREATE MULTISET TABLE complaints_tokens_model AS ( SELECT * FROM NaiveBayesTextClassifierTrainer ( ON ( SELECT * FROM NaiveBayesTextClassifierInternal ( ON ( SELECT doc_id, lower(token) AS token, category FROM TextTokenizer ( ON complaints PARTITION BY ANY USING TextColumn ('text_data') OutputByWord ('true') Accumulate ('doc_id', 'category') ) AS dt1 ) AS "input" PARTITION BY category USING TokenColumn ('token') ModelType ('Bernoulli') DocIDColumns ('doc_id') DocCategoryColumn ('category') ) AS dt2 ) PARTITION BY 1 ) AS dt3 ) WITH DATA;
Output
This query returns the following table:
SELECT * FROM complaints_tokens_model;
token | category | prob |
---|---|---|
ASTER_NAIVE_BAYES_TEXT_MODEL_TYPE | BERNOULLI | 1 |
been | crash | 0.285714285714286 |
been | no_crash | 0.235294117647059 |
accurate | no_crash | 0.117647058823529 |
joints | no_crash | 0.117647058823529 |
shift | no_crash | 0.117647058823529 |
about | crash | 0.285714285714286 |
about | no_crash | 0.117647058823529 |
bag | crash | 0.285714285714286 |
... | ... | .. |