Input
- InputTable: complaints_tokenized, created by calling TextTokenizer with the table complaints (which appears in NaiveBayesTextClassifierTrainer2 Example: IsTokenized ('false')):
CREATE MULTISET TABLE complaints_tokenized AS ( SELECT * FROM TextTokenizer ( ON complaints PARTITION BY ANY USING InputLanguage ('en') OutputDelimiter (' ') OutputByWord ('true') Accumulate ('doc_id','category') TextColumn ('text_data') ) AS dt ) WITH DATA;
SQL Call
This call creates a bernoulli model table, complaints_tokens_model, by calling NaiveBayesTextClassifierTrainer2 with the table complaints_tokenized.
SELECT * FROM NaiveBayesTextClassifierTrainer2 ( ON complaints_tokenized AS InputTable OUT TABLE ModelTable (complaints_tokens_model) USING TokenColumn ('token') IsTokenized ('TRUE') ModelType ('Bernoulli') DocIdColumns ('doc_id') DocCategoryColumn ('category') ) AS dt;
Output
This query returns the following table:
SELECT * FROM complaints_tokens_model ORDER BY prob desc;
token category prob ------------------------------------- --------- -------------------- NAIVE_BAYES_TEXT_MODEL_TYPE BERNOULLI 1.0 . no_crash 0.9411764705882353 . crash 0.8571428571428571 NAIVE_BAYES_PRIOR_PROBABILITY no_crash 0.75 and crash 0.7142857142857143 a crash 0.7142857142857143 the no_crash 0.7058823529411765 vehicle no_crash 0.5882352941176471 driver's crash 0.5714285714285714 approximately crash 0.5714285714285714 head-on crash 0.5714285714285714 not crash 0.5714285714285714 hit crash 0.5714285714285714 mph crash 0.5714285714285714 air crash 0.5714285714285714 deploy crash 0.5714285714285714 passenger's crash 0.5714285714285714 did crash 0.5714285714285714 side crash 0.5714285714285714 vehicle crash 0.5714285714285714 and no_crash 0.5294117647058824 to no_crash 0.47058823529411764 at crash 0.42857142857142855 in crash 0.42857142857142855 another crash 0.42857142857142855 on crash 0.42857142857142855 consumer crash 0.42857142857142855 injuries crash 0.42857142857142855 driver crash 0.42857142857142855 deployed crash 0.42857142857142855 into crash 0.42857142857142855 the crash 0.42857142857142855 was crash 0.42857142857142855 to crash 0.42857142857142855 bags crash 0.42857142857142855 front crash 0.42857142857142855 airbags crash 0.42857142857142855 was no_crash 0.4117647058823529 on no_crash 0.35294117647058826 consumer no_crash 0.35294117647058826 ...
Download a zip file of all examples and a SQL script file that creates their input tables.