7.00.02.01 - Using the Random Forest Model with Aster R - Aster R

Teradata Aster® R User GuideUpdate 3

prodname
Aster R
vrm_release
7.00.02.01
created_date
December 2017
category
Programming Reference
User Guide
featnum
B700-1033-700K

This section uses the dataset "fgl" found in the R package "MASS". This dataset includes nine different measurements on 214 samples of different types of glass. A tenth column indicates the type of glass, classifying the samples into one of six types.

This section also illustrates the use of the ta.push() function to transfer data from R into the Aster Database.

  1. Add an "id" column to the data frame.
    fgl_with_rowids<-cbind(rownames(fgl), fgl)
  2. Create an empty data frame.
    id<-integer()
    RI<-numeric()
    Na<-numeric()
    Mg<-numeric()
    Al<-numeric()
    Si<-numeric()
    K<-numeric()
    Ca<-numeric()
    Ba<-numeric()
    Fe<-numeric()
    type<-character()
    
    glass_data<-data.frame(id,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,type)
  3. Use the empty data frame to create an empty Aster Database table with the same schema to hold the data.
    ta.create(glass_data, 
    table="fgltmptable", 
    schema="public", 
    tableType="dimension"
    )
  4. Create a virtual data frame. This virtual data frame enables users to use Aster R functions to access and manipulate the data.
    tadf_glass<-ta.data.frame("fgltmptable")
  5. Use the function ta.push() to copy the data from the R data frame "fgl_with_rowids" to the virtual data frame "tadf_glass".
    tadf_glass<-ta.push(tadf_glass, fgl_with_rowids)
    ta.head(tadf_glass)
         id     RI     Na    Mg    Al     Si     K    Ca  Ba    Fe  type
    1     1   3.01  13.64  4.49  1.10  71.78  0.06  8.75   0  0.00  WinF
    2     2  -0.39  13.89  3.60  1.36  72.73  0.48  7.83   0  0.00  WinF
    3     3  -1.82  13.53  3.55  1.54  72.99  0.39  7.78   0  0.00  WinF
    4     4  -0.34  13.21  3.69  1.29  72.61  0.57  8.22   0  0.00  WinF
    5     5  -0.58  13.27  3.62  1.24  73.08  0.55  8.07   0  0.00  WinF
    6     6  -2.04  12.79  3.61  1.62  72.97  0.64  8.07   0  0.26  WinF
  6. The next step is to divide the data into training and test datasets. To ensure that the training set has representatives of each of the six types of glass, split the table by the "type" column, and then divide the observations for each type into training and test subsets. Then combine the training and test subsets for each type to create training and test subsets that cover the entire dataset.
    1. Use the function ta.split() to split the virtual data frame according to the glass type.
      glass_types<-ta.split(tadf_glass, "type")

      The output of the ta.split() function is a ta.list.

    2. Get each individual data frame from the ta.list "glass_types".
      Con<-glass_types[[1]]
      Head<-glass_types[[2]]
      Tabl<-glass_types[[3]]
      Veh<-glass_types[[4]]
      WinF<-glass_types[[5]]
      WinNF<-glass_types[[6]]
    3. Create the training and test subsets for each glass type, using 70% for the training subsets and 30% for the test subsets.
      Con_train_indices=sample(1:ta.nrow(Con), 0.7*ta.nrow(Con))
      Con.test=Con[-Con_train_indices,]
      Con.train=Con[Con_train_indices,]
      Tabl_train_indices=sample(1:ta.nrow(Tabl), 0.7*ta.nrow(Tabl))
      Tabl.test=Tabl[-Tabl_train_indices,]
      Tabl.train=Tabl[Tabl_train_indices,]
      Veh_train_indices=sample(1:ta.nrow(Veh), 0.7*ta.nrow(Veh))
      Veh.test=Veh[-Veh_train_indices,]
      Veh.train=Veh[Veh_train_indices,]
      Head_train_indices=sample(1:ta.nrow(Head), 0.7*ta.nrow(Head))
      Head.test=Head[-Head_train_indices,] 
      Head.train=Head[Head_train_indices,]
      WinF_train_indices=sample(1:ta.nrow(WinF), 0.7*ta.nrow(WinF))
      WinF.test=WinF[-WinF_train_indices,]
      WinF.train=WinF[WinF_train_indices,]
      WinNF_train_indices=sample(1:ta.nrow(WinNF), 0.7*ta.nrow(WinNF))
      WinNF.test=WinNF[-WinNF_train_indices,]
      WinNF.train=WinNF[WinNF_train_indices,]
    4. Combine the training and test subsets to create training and test datasets that contain samples of each of the six glass types.
      fgl.test<-rbind(WinNF.test, Con.test, Tabl.test, Veh.test, WinF.test, Head.test)
      
      fgl.train<-rbind(WinNF.train, Con.train, Tabl.train, Veh.train, WinF.train, Head.train)
  7. Create virtual data frames for the training and test datasets.
    tadf_test<-as.ta.data.frame(fgl.test)
    
    tadf_train<-as.ta.data.frame(fgl.train)
  8. Create the Random Forest model using the training dataset.
    glass_rf_list<-aa.forest(   
        formula = (type~RI+Na+Mg+Al+Si+K+Ca+Ba+Fe),   
        tree.type = "classification",
        data = tadf_train )
    glass_rf_list_1<-aa.forest(     
        formula = (type~RI+Na+Mg+Al+Si+K+Ca+Ba+Fe),     
        tree.type = "classification",
        data = tadf_train,
        ntree = 6, 
        mtry = 3 )
  9. Predict on the test dataset.
    aa.forest.predict(   
        object = glass_rf_list,   
        newdata = tadf_test,   
        id.column = "id"  
        )
    aa.forest.predict(   
        object = glass_rf_list_1, 
        newdata = tadf_test,  
        id.column = "id"  
        )