7.00.02 - Input - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Published
September 2017
Content Type
Programming Reference
User Guide
Publication ID
B700-1022-700K
Language
English (United States)
Last Update
2018-04-17

The training table is log of vehicle complaints. The category column indicates whether the car has been in a crash.

LDATrainer Example Training Table complaints
doc_id text_data category
1 consumer was driving approximately 45 mph hit a deer with the front bumper and then ran into an embankment head-on passenger's side air bag did deploy hit windshield and deployed outward. driver's side airbag cover opened but did not inflate it was still folded causing injuries. crash
2 when vehicle was involved in a crash totalling vehicle driver's side/ passenger's side air bags did not deploy. vehicle was making a left turn and was hit by a ford f350 traveling about 35 mph on the front passenger's side. driver hit his head-on the steering wheel. hurt his knee and received neck and back injuries. crash
3 consumer has experienced following problems; 1.) both lower ball joints wear out excessively; 2.) head gasket leaks; and 3.) cruise control would shut itself off while driving without foot pressing on brake pedal. no_crash
... ... ...

The stop words table, stopwords.text, contains:

a
an
in
is
to
into
was
the
and
this
with
they
but
will

To generate a tokenized, filtered input file for the LDATrainer function, apply the function Text_Parser to the training table:

SELECT * FROM Text_Parser (
  ON complaints
  TextColumn ('text_data')
  ToLowerCase ('true')
  Stemming ('false')
  Punctuation ('\[.,?\!\]')
  ListPositions ('true')
  StopWords ('stopwords.txt')
  RemoveStopWords ('true')
  Accumulate ('doc_id', 'category')
) ORDER BY doc_id;

The following query returns the output shown in the following table:

SELECT * FROM complaints_traintoken ORDER BY doc_id;
LDATrainer Example Tokenized and Filtered Input Table complaints_traintoken
doc_id category token frequency position
1 crash consumer 1 0
1 crash driving 1 2
1 crash approximately 1 3
1 crash 45 1 4
1 crash mph 1 5
1 crash hit 2 6,26
1 crash deer 1 8
1 crash front 1 11
1 crash bumper 1 12
1 crash then 1 14
1 crash ran 1 15
1 crash embankment 1 18
1 crash head-on 1 19
1 crash passenger's 1 20
1 crash side 2 21,32
... ... ... ... ...