7.00.02.01 - Example 3: Regression - Aster R

Teradata Aster® R User GuideUpdate 3

prodname
Aster R
vrm_release
7.00.02.01
created_date
December 2017
category
Programming Reference
User Guide
featnum
B700-1033-700K
The input table in this example is taken from the "cats" dataset in the R package "MASS".

This example reads all of the rows from the input table, builds a linear regression model, and outputs a table containing the original input columns plus a column containing the fitted values from the regression.

  1. Create the R script with the name "cats_regr_1103.R".
    # Script filename: cats_regr_1103.R
    #File descriptor pointing to standard input
    in_table = file(description="stdin",open="r") 
    
    # Read column values from stdin into a vector, nrows = -1 means read all rows
    while(1){ 
        fields<-try(read.table(in_table,header=FALSE,sep="\t",quote="",nrows=-1),silent=TRUE
        ) 
        if(inherits(fields,"try-error"))   
        break 
        #Fields are from database table cats_tmp 
        sex <- as.character(fields[,1]) 
        body_weight <- as.numeric(fields[,2]) 
        heart_weight   <-
        as.numeric(fields[,3]) 
         
        # do regression 
        regr <- lm(heart_weight ~ body_weight) 
        outrec<-data.frame(sex, body_weight, heart_weight, regr$fitted )   
    
        #write to standard output stream 
        write.table(outrec,
        stdout(),col.names=FALSE,row.names=FALSE,quote=FALSE,sep="\t")
    }
    # END
  2. Load the R package "MASS" for use in this example.
    library(MASS)
  3. Create a table in the database based on the input data "cats" from the loaded R package "MASS", and convert it to a virtual data frame.
    ta.create(cats, 
    table="cats_tmp", 
    tableType="dimension", 
    colTypes=c("varchar", "numeric", "numeric")
    )
    
    db.tadf.cats <- ta.data.frame("cats_tmp")
  4. Create the vector defining the output column names and types.
    outputs_cats <- c("sex char","BodyWt numeric","HeartWt numeric","FittedHtWt numeric")
  5. Install the R script on the cluster.
    ta.install.scripts('cats_regr_1103.R', schema="public")
  6. Use the Aster R runner function ta.source() to run the script.
    ta.source(data=db.tadf.cats, script="cats_regr_1103.R", output=outputs_cats)
    The output is shown here.
    
        sex BodyWt HeartWt    FittedHtWt
    1     F    2.0     7.0      7.711463
    2     F    2.0     7.4      7.711463
    3     F    2.0     9.5      7.711463
    4     F    2.1     7.2      8.114869
    5     F    2.1     7.3      8.114869
    6     F    2.1     7.6      8.114869
    7     F    2.1     8.1      8.114869
    8     F    2.1     8.2      8.114869
    9     F    2.1     8.3      8.114869
    10    F    2.1     8.5      8.114869
    11    F    2.1     8.7      8.114869
    12    F    2.1     9.8      8.114869
    13    F    2.2     7.1      8.518276
    14    F    2.2     8.7      8.518276
    15    F    2.2     9.1      8.518276
    16    F    2.2     9.7      8.518276
    17    F    2.2    10.9      8.518276
    18    F    2.2    11.0      8.518276
    19    F    2.3     7.3      8.921682
    20    F    2.3     7.9      8.921682
    ...   ...  ...     ...      ...
    Users can also run this example using ACT commands from within the Aster Database as shown here. The output is the same.
    1. Create the R script and the input table "cats_tmp" in the database, as in Step 1 to 3.
    2. Install the R script.
      \install cats_regr_1103.R
    3. Call the Stream function as shown here.
      
      select * from stream (on cats_tmp 
      script('Rexec --vanilla cats_regr_1103.R') 
      outputs('sex varchar', 'BodyWt numeric','HeartWt numeric', 'FittedHtWt numeric') delimiter('\t') 
      );