- tokens: Created by applying the TextTokenizer function to the training table complaints, a log of vehicle complaints
In complaints, the category column indicates whether the car has been in a crash.
|1||consumer was driving approximately 45 mph hit a deer with the front bumper and then ran into an embankment head-on passenger's side air bag did deploy hit windshield and deployed outward. driver's side airbag cover opened but did not inflate it was still folded causing injuries.||crash|
|2||when vehicle was involved in a crash totalling vehicle driver's side/ passenger's side air bags did not deploy. vehicle was making a left turn and was hit by a ford f350 traveling about 35 mph on the front passenger's side. driver hit his head-on the steering wheel. hurt his knee and received neck and back injuries.||crash|
|3||consumer has experienced following problems; 1.) both lower ball joints wear out excessively; 2.) head gasket leaks; and 3.) cruise control would shut itself off while driving without foot pressing on brake pedal.||no_crash|
This call creates the model table, complaints_tokens_model, by calling NaiveBayesTextClassifierTrainer. It creates the NaiveBayesTextClassifierTrainer input table, token, by applying TextTokenizer to the table complaints.
CREATE MULTISET TABLE complaints_tokens_model AS ( SELECT * FROM NaiveBayesTextClassifierTrainer ( ON ( SELECT * FROM NaiveBayesTextClassifierInternal ( ON ( SELECT doc_id, lower(token) AS token, category FROM TextTokenizer ( ON complaints PARTITION BY ANY USING TextColumn ('text_data') OutputByWord ('true') Accumulate ('doc_id', 'category') ) AS dt1 ) AS "input" PARTITION BY category USING TokenColumn ('token') ModelType ('Bernoulli') DocIDColumns ('doc_id') DocCategoryColumn ('category') ) AS dt2 ) PARTITION BY 1 ) AS dt3 ) WITH DATA;
This query returns the following table:
SELECT * FROM complaints_tokens_model;