7.00.02.01 - Using Text Analysis with Aster R - Aster R

Teradata Aster® R User GuideUpdate 3

prodname
Aster R
vrm_release
7.00.02.01
created_date
December 2017
category
Programming Reference
User Guide
featnum
B700-1033-700K
This section uses a log of vehicle complaints that have been categorized as crash-related or not crash-related. Users can use this log to build a Naïve Bayes Text Classifier model, and then apply the model to a new log data to predict if the complaint is associated with a crash.

It is assumed that the training and test datasets are already in the Aster Database.

This is the table containing the training dataset.

doc_id text_data category
1 consumer was driving approximately 45 mph hit a deer with the front bumper and then ran into an embankment head-on passenger's side air bag did deploy hit windshield and deployed outward. driver's side airbag cover opened but did not inflate it was still folded causing injuries. crash
2 when vehicle was involved in a crash totalling vehicle driver's side/ passenger's side air bags did not deploy. vehicle was making a left turn and was hit by a ford f350 traveling about 35 mph on the front passenger's side. driver hit his head-on the steering wheel. hurt his knee and received neck and back injuries. crash
3 consumer has experienced following problems; 1.) both lower ball joints wear out excessively; 2.) head gasket leaks; and 3.) cruise control would shut itself off while driving without foot pressing on brake pedal. no_crash
4 transfer case was repaired under recall. after the work was completed noise was heard intermittently. consumer took vehicle back to dealer. the dealer re-inspected vehicle and informed the owner that the driveshaft was hitting the transfer case. the manufacturer has been notified. no_crash
5 transmission would start to slip when traveling just 10mph. the rpms would be over 3 thousand. had vehicle checked at dealership & was informed transmission was stuck & that it's a factory defect almost blew up. also speedometer does not keep accurate speeds. if speed is increased, it would fail to work. this was referred to mechanic by manufacturer. no_crash
6 due to the defective ignition cable which burned the coil the vehicle stalled unexpectedly which could have resulted in a crash. also dealer replaced the r&r drive belts/speed control cable and performed vehicle tune up. please provide further information. no_crash
7 when switch is turned on windshield wipers would not work properly. would have to jiggle switch & then wipers would move. wipers do turn off/on by themselves. recall 97v017000. no_crash
8 consumer was driving in a rain storm when the windshield wipers stopped this happened periodically. no_crash
9 at 66900 miles transmission has malfunctioned and will not shift into first gear. repairs were made at owner's expense wants reimbursement. *ml no_crash
10 when truck was sitting on an incline it rolled on its own. manufacturer was aware of the problem. problem has not been corrected. the truck is owned by walnut hill recker manufactured in 1998. no_crash
11 car engine raced while slowing to park. car lurched forward and crashed into a fence and a building. car had been in shop approximately one week prior to incident for high idle condition. crash
12 rear ended another vehicle at 65 to 70mph and neither driver's side or passenger's side airbags deployed. dealer has vehicle. crash
13 while vehicle was parked for an hour a fire started on the left side of the engine compartment. owners son smelled smoke owner saw fire coming from around drivers side front wheel. referenced in ea02-025 no_crash
14 after vehicle was repaired under recall 99v029000 ignition switch the airbag light stayed on . the dealer and the manufacturer has been notified. no_crash
15 electrical control module is shortening out causing the vehicle to stall. engine will become totally inoperative. consumer had to change alternator/ battery and starter and module replaced 4 times but defect still occurring cannot determine what is causing the problem. no_crash
16 at 68000 miles power steering broke off the housing pump causing total loss of power steering which also caused the vehicle to shut down. no_crash
17 on two occasions dual airbags did not deploy. consumer rear-ended another vehicle at approximately 50 mph and at 80 mph hit a truck head-on upon impact air bags did not deploy. driver sustained injuries. dealer did not determine why air bags did not deploy. crash
18 sunroof is leaking. no_crash
19 motor and the frame separated from vehicle. manufacturer will be notified. no_crash
20 rear front wheel bearing broke causing vehicle to pull to the left when slowing down. consumer had brake's replaced about four times and still dealer can't determine the problem. no_crash

This is the table containing the test dataset.

doc_id text_data
1 ELECTRICAL CONTROL MODULE IS SHORTENING OUT, CAUSING THE VEHICLE TO STALL. ENGINE WILL BECOME TOTALLY INOPERATIVE. CONSUMER HAD TO CHANGE ALTERNATOR/ BATTERY AND STARTER, AND MODULE REPLACED 4 TIMES, BUT DEFECT STILL OCCURRING CANNOT DETERMINE WHAT IS CAUSING THE PROBLEM.
2 ABS BRAKES FAIL TO OPERATE PROPERLY, AND AIR BAGS FAILED TO DEPLOY DURING A CRASH AT APPROX. 28 MPH IMPACT. MANUFACTURER NOTIFIED.
3 WHILE DRIVING AT 60 MPH GAS PEDAL GOT STUCK DUE TO THE RUBBER THAT IS AROUND THE GAS PEDAL.
4 THERE IS A KNOCKING NOISE COMING FROM THE CATALYTIC CONVERTER, AND THE VEHICLE IS STALLING. ALSO, HAS PROBLEM WITH THE STEERING.
5 CONSUMER WAS MAKING A TURN, DRIVING AT APPROX 5-10 MPH WHEN CONSUMER HIT ANOTHER VEHICLE. UPON IMPACT, DUAL AIRBAGS DID NOT DEPLOY. ALL DAMAGE WAS DONE FROM ENGINE TO TRANSMISSION, TO THE FRONT OF VEHICLE, AND THE VEHICLE CONSIDERED A TOTAL LOSS.
6 WHEEL BEARING AND HUBS CRACKED, CAUSING THE METAL TO GRIND WHEN MAKING A RIGHT TURN. ALSO WHEN APPLYING THE BRAKES, PEDAL GOES TO THE FLOOR, CAUSE UNKNOWN. WAS ADVISED BY MIDAS NOT TO DRIVE VEHICLE- WHEELE COULD COME OFF.
7 DRIVING ABOUT 5-10 MPH, THE VEHICLE HAD A LOW FRONTAL IMPACT IN WHICH THE OTHER VEHICLE HAD NO DAMAGES. UPON IMPACT, DRIVER'S AND THE PASSENGER'S AIR BAGS DID NOT DEPLOY, RESULTING IN INJURIES. PLEASE PROVIDE FURTHER INFORMATION AND VIN#.
8 THE AIR BAG WARNING LIGHT HAS COME ON, INDICATING AIRBAGS ARE INOPERATIVE. THEY WERE FIXED ONE AT THE TIME, BUT PROBLEM HAS REOCCURRED.
9 CONSUMER WAS DRIVING WEST WHEN THE OTHER CAR WAS GOING EAST. THE OTHER CAR TURNED IN FRONT OF CONSUMER'S VEHICLE, CONSUMER HIT OTHER VEHICLE AND STARTED TO SPIN AROUND, COULDN'T STOP, RESULTING IN A CRASH. UPON IMPACT, AIRBAGS DIDN'T DEPLOY.
10 WHILE DRIVING ABOUT 65 MPH AND THE TRANSMISSION MADE A STRANGE NOISE, AND THE LEFT FRONT AXLE LOCKED UP. THE DEALER HAS REPAIRED THE VEHICLE.

This example shows the steps to build a Naïve Bayes Text Classifier model and then apply the model to the new log data.

  1. Create a ta.data.frame consisting of tokens from the training dataset. This requires the use of the Aster Analytics function TextTokenizer().
    tadf_nbayes_tokens <- ta.data.frame("SELECT doc_id, lower(token) AS token, category FROM TextTokenizer (ON complaints PARTITION BY ANY TextColumn ('text_data') OutputByWord ('true') Accumulate ('doc_id', 'category'))",  sourceType = "query")
  2. Train a new Naïve Bayes Text Classifier using the function aa.naivebayes.textclassifier.train().
    nbayes_model_1 <- aa.naivebayes.textclassifier.train(tadf_nbayes_tokens, partition.column = "category", token.column = "token", doc.category.column = "category")
    The output of the function aa.naivebayes.textclassifier.train() is a list.
  3. Use the "[[" operator to access the first item in the list, which is the model.
    nbayes_model <- nbayes_model_1[[1]]
  4. Apply the model to the new log data from the test dataset.
    1. Create a ta.data.frame with the new log data. The new data is in an Aster table named "nbayes_test".
      tadf_nbayes_test <- ta.data.frame("SELECT doc_id, lower(token) AS token FROM TextTokenizer (ON nbayes_test PARTITION BY ANY TextColumn ('text_data') OutputByWord ('true') Accumulate ('doc_id'))", sourceType = "query")
    2. Predict the categories ('crash' or 'no_crash') on the new log data.
      aa.naivebayes.textclassifier.predict(object = nbayes_model, newdata = tadf_nbayes_test, input.token.column = 'token', doc.id.columns = 'doc_id', model.type = "BERNOULLI", model.token.column = 'token', model.category.column = 'category', model.prob.column = 'prob', newdata.partition.column = 'doc_id')
      The output is:
      $result
         doc_id    prediction        loglik
      1       1      no_crash     -98.37344
      2       1         crash    -131.94910
      3       3      no_crash     -74.45876
      4       3         crash    -104.57805
      5       5         crash    -115.41886
      6       5      no_crash    -117.81438
      7       7         crash    -111.70604
      8       7      no_crash    -115.48061
      9       9      no_crash    -108.85304
      10      9         crash    -117.02385
      11      2      no_crash     -93.41932
      12      2         crash    -106.37023
      13      4      no_crash     -80.04840
      14      4         crash    -109.84042
      15      6      no_crash    -116.11213
      16      6         crash    -131.70746
      17      8      no_crash     -91.97336
      18      8         crash    -111.05259
      19     10      no_crash     -82.65103
      20     10         crash    -105.11395
      
      attr(,"class")
      [1] "aa.naivebayes.textclassifier.predict"
      Warning message:
      In ta.show(tadf, maxRows) :
        Printing rows in random order since base table/view is
          neither ordered nor have row_names column.
      The "Warning message" in the output is an explanatory note to the user.