TextParser Example 1: StopWords without StemmingExceptions - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

Input

  • InputTable: complaints, a log of vehicle complaints.

    The category column indicates whether the vehicle was in a crash.

  • Stop words file: stopwords.txt, which is preinstalled on the ML Engine (shown in TextClassifierTrainer Example)
complaints
doc_id text_data category
1 consumer was driving approximately 45 mph hit a deer with the front bumper and then ran into an embankment head-on passenger's side air bag did deploy hit windshield and deployed outward. driver's side airbag cover opened but did not inflate it was still folded causing injuries. crash
2 when vehicle was involved in a crash totalling vehicle driver's side/ passenger's side air bags did not deploy. vehicle was making a left turn and was hit by a ford f350 traveling about 35 mph on the front passenger's side. driver hit his head-on the steering wheel. hurt his knee and received neck and back injuries. crash
3 consumer has experienced following problems; 1.) both lower ball joints wear out excessively; 2.) head gasket leaks; and 3.) cruise control would shut itself off while driving without foot pressing on brake pedal. no_crash
... ... ...

SQL Call

SELECT * FROM TextParser (
  ON complaints
  USING
  TextColumn ('text_data')
  ToLowerCase ('true')
  Stemming ('false')
  OutputByWord ('true')
  Punctuation ('\[.,?\!\]')
  RemoveStopWords ('true')
  ListPositions ('true')
  Accumulate ('doc_id', 'category')
  StopWords ('stopwords.txt')
) AS dt ORDER BY doc_id;

Output

doc_id category token frequency location
1 crash consumer 1 0
1 crash driving 1 2
1 crash approximately 1 3
1 crash 45 1 4
1 crash mph 1 5
1 crash hit 2 6,26
1 crash deer 1 8
1 crash front 1 11
1 crash bumper 1 12
1 crash then 1 14
1 crash ran 1 15
1 crash embankment 1 18
... ... ... ... ...