Input
The input table is log of vehicle complaints. The category column indicates whether the car has been in a crash.
complaints
doc_id |
text_data |
category |
1 |
consumer was driving approximately 45 mph hit a deer with the front bumper and then ran into an embankment head-on passenger's side air bag did deploy hit windshield and deployed outward. driver's side airbag cover opened but did not inflate it was still folded causing injuries. |
crash |
2 |
when vehicle was involved in a crash totalling vehicle driver's side/ passenger's side air bags did not deploy. vehicle was making a left turn and was hit by a ford f350 traveling about 35 mph on the front passenger's side. driver hit his head-on the steering wheel. hurt his knee and received neck and back injuries. |
crash |
3 |
consumer has experienced following problems; 1.) both lower ball joints wear out excessively; 2.) head gasket leaks; and 3.) cruise control would shut itself off while driving without foot pressing on brake pedal. |
no_crash |
... |
... |
... |
SQL Call
SELECT * FROM TextTokenizer (
ON complaints AS "input" PARTITION BY ANY
USING
InputLanguage ('en')
OutputDelimiter (' ')
OutputByWord ('true')
Accumulate ('doc_id')
TextColumn ('text_data')
) AS dt ORDER BY doc_id, sn, token;
Output
doc_id |
sn |
token |
1 |
1 |
consumer |
1 |
2 |
was |
1 |
3 |
driving |
1 |
4 |
approximately |
1 |
5 |
45 |
1 |
6 |
mph |
1 |
7 |
hit |
1 |
8 |
a |
1 |
9 |
deer |
1 |
10 |
with |
1 |
11 |
the |
1 |
12 |
front |
... |
... |
... |