7.00.02.01 - Example 2: ta.by() and aa.tapply() - Aster R

Teradata Aster® R User GuideUpdate 3

prodname
Aster R
vrm_release
7.00.02.01
created_date
December 2017
category
Programming Reference
User Guide
featnum
B700-1033-700K

This example uses the dataset "Sitka" found in the R package "MASS". The dataset consists of growth data for 79 trees, 54 of which were grown in an ozone-rich environment and the other 25 were used as controls. Each tree was measured at 5 different times.

In this example, users use the R subset() function to divide the dataset into control and ozone-exposed subsets. Then, for the ozone-exposed subset, users use the Aster R runner function ta.by() in combination with the R function lm() to create 54 linear regression models, one for each tree in the group. Equivalent functionality is available with the function aa.tapply().

  1. Divide the Sitka dataset into control and ozone subsets.
    sitka_ozone<-subset(Sitka, treat=="ozone")
    
    sitka_control<-subset(Sitka, treat=="control")
  2. Remove the 'treat' column as it is no longer needed.
    sitka_ozone<-sitka_ozone[,1:3]
    
    sitka_control<-sitka_control[,1:3]
  3. Create a table for each subset in the Aster Database.
    
    ta.create(sitka_ozone, 
    table="sitka_ozone", 
    schemaName="public", 
    tableType="dimension", 
    row.names=TRUE, 
    colTypes=NULL
    )
    
    
    ta.create(sitka_control, 
    table="sitka_control", 
    schemaName="public", 
    tableType="dimension", 
    row.names=TRUE, 
    colTypes=NULL
    )
  4. Create virtual data frames.
    db.tadf_sitka_ozone<-ta.data.frame("sitka_ozone")
    
    db.tadf_sitka_control<-ta.data.frame("sitka_control")
  5. Create a linear regression model for each tree in the 'ozone' dataset, using the runner function ta.by().
    LM_models<-ta.by(db.tadf_sitka_ozone, db.tadf_sitka_ozone[,"tree"], function(x) lm(size~Time, data=x))
    
    class(LM_models)
    [1] "list"
    > length(LM_models)
    [1] 54
    
    > LM_models[[1]]
    Call:
    lm(formula = size ~ Time, data = x)
    Coefficients:
    (Intercept)         Time 
        2.20771      0.01572 
    Equivalent functionality is available using the function aa.tapply().
    func_a<-function(x) {lm(size~Time, data = x)}
    LM_models_2<-aa.tapply(db.tadf_sitka_control, 
                           FUN=func_a, 
                           INDEX=db.tadf_sitka_control$tree,
                           out.format=list(type="object"))
    

    The output LM_models_2 is a virtual object.

    class(LM_models_2)
    [1] "aa.object"
    

    And it can be converted to a standard R list.

    LM_models_2_r<-as.object(LM_models_2)