Example 6: Creating and Using R Machine Learning Models with Aster R - Aster R

Teradata Aster® R User GuideUpdate 3

Product
Aster R
Release Number
7.00.02.01
Published
December 2017
Language
English (United States)
Last Update
2018-04-13
dita:mapPath
fop1497542774450.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
fbp1477004286096
lifecycle
previous
Product Category
Software
This example shows how to create a model and use the model to make predictions on a new dataset. It uses the "Boston" dataset from the MASS package. This example splits the "Boston" dataset into separate training and test datasets to create and evaluate the performance of the model.
  1. Add row identifiers to the dataset.
    library(MASS)
    
    Boston.wi <- data.frame(id=as.integer(row.names(Boston)), Boston )
  2. Divide the dataset into training and test datasets.
    train=sample(1:nrow(Boston.wi), 400)
    
    Boston.wi.train = Boston.wi [train,]
    Boston.wi.test= Boston.wi [-train,]
  3. Create the virtual data frames.
    ta.dropTable("boston_tr", schemaName = "public")
    ta.dropTable("boston_te", schemaName = "public")
    
    tadf.boston.train <- ta.create(Boston.wi.train, table = 'boston_tr', 
                                   schemaName = 'public', tableType = 'fact', 
                                   partitionKey = 'id')
    
    tadf.boston.test <- ta.create(Boston.wi.test, table = 'boston_te',
                                  schemaName = 'public', tableType = 'fact', 
                                  partitionKey = 'id')
    
  4. Create an R function to build a linear regression model using the columns 'lstat', 'crim', 'rad', and 'zn' to predict 'medv'.
    lm_model <- function(tadf ) {
      model <- lm(data=tadf, 
      medv~lstat+crim+rad+zn)
      return(model)
    }
  5. Create the predict function.
    predict_lm <- function(tadf, model ) {
      out <- predict(model, newdata=tadf)
      return(out)
    }
  6. Use the function aa.apply() to create the model.
    boston_model<-aa.apply(tadf.boston.train, FUN=lm_model, out.format=list(type="object"))
    As the output is a model, the output type is "object".
  7. Use the aa.apply function to apply the predict function created in Step 5 to the test dataset.
    aa_predict <- aa.apply(tadf.boston.test, 
                           FUN = predict_lm, 
                           FUN.args=list(boston_model[1]), 
                           out.format=list(columns=c("id","medv_pred"),
                                           columnTypes=c("integer","numeric")))
    

    The first ten rows of the output are shown here.

    > aa_predict
    	id 	medv_pred
    1   113 	19.503383
    2   161 	29.071735
    3   221 	25.407370
    4   260 	28.196376
    5   284 	32.731455
    6   364 	21.083813
    7   464 	24.901682
    8   488 	23.922381
    9   165 	23.459021
    10  272 	28.461022
    
  8. Compare the predicted and observed values for 'medv'.
    1. Create a dataframe containing only the 'id' and observed 'medv' values.
      aa_obs<-tadf.boston.test[,c("id","medv")]
    2. Use the ta.join() function to create a table containing the observed ( 'medv') and predicted ('medv_pred') values for each 'id' in the original test dataset.
      ta.join(aa_obs, aa_predict, by="id")

      The first few rows of output are shown here.

              medv	 medv_pred 	x.id y.id
      1   	22.2 	28.548877  	 63   63
      2   	18.5 	24.704342  	115  115
      3   	24.4 	29.219659  	251  251
      4   	21.6 	26.948964  	314  314
      5   	23.1 	27.910150  	322  322
      6    	5.0  	9.131461  	406  406
      7   	20.4 	21.392737  	108  108
      8   	15.6 	20.316601  	156  156
      9   	50.0 	31.023614  	164  164
      10  	30.5 	30.666012  	192  192