Example 2: ta.by() and aa.tapply() - Aster R

Teradata Aster® R User GuideUpdate 3

Product
Aster R
Release Number
7.00.02.01
Published
December 2017
Language
English (United States)
Last Update
2018-04-13
dita:mapPath
fop1497542774450.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
fbp1477004286096
lifecycle
previous
Product Category
Software

This example uses the dataset "Sitka" found in the R package "MASS". The dataset consists of growth data for 79 trees, 54 of which were grown in an ozone-rich environment and the other 25 were used as controls. Each tree was measured at 5 different times.

In this example, users use the R subset() function to divide the dataset into control and ozone-exposed subsets. Then, for the ozone-exposed subset, users use the Aster R runner function ta.by() in combination with the R function lm() to create 54 linear regression models, one for each tree in the group. Equivalent functionality is available with the function aa.tapply().

  1. Divide the Sitka dataset into control and ozone subsets.
    sitka_ozone<-subset(Sitka, treat=="ozone")
    
    sitka_control<-subset(Sitka, treat=="control")
  2. Remove the 'treat' column as it is no longer needed.
    sitka_ozone<-sitka_ozone[,1:3]
    
    sitka_control<-sitka_control[,1:3]
  3. Create a table for each subset in the Aster Database.
    
    ta.create(sitka_ozone, 
    table="sitka_ozone", 
    schemaName="public", 
    tableType="dimension", 
    row.names=TRUE, 
    colTypes=NULL
    )
    
    
    ta.create(sitka_control, 
    table="sitka_control", 
    schemaName="public", 
    tableType="dimension", 
    row.names=TRUE, 
    colTypes=NULL
    )
  4. Create virtual data frames.
    db.tadf_sitka_ozone<-ta.data.frame("sitka_ozone")
    
    db.tadf_sitka_control<-ta.data.frame("sitka_control")
  5. Create a linear regression model for each tree in the 'ozone' dataset, using the runner function ta.by().
    LM_models<-ta.by(db.tadf_sitka_ozone, db.tadf_sitka_ozone[,"tree"], function(x) lm(size~Time, data=x))
    
    class(LM_models)
    [1] "list"
    > length(LM_models)
    [1] 54
    
    > LM_models[[1]]
    Call:
    lm(formula = size ~ Time, data = x)
    Coefficients:
    (Intercept)         Time 
        2.20771      0.01572 
    Equivalent functionality is available using the function aa.tapply().
    func_a<-function(x) {lm(size~Time, data = x)}
    LM_models_2<-aa.tapply(db.tadf_sitka_control, 
                           FUN=func_a, 
                           INDEX=db.tadf_sitka_control$tree,
                           out.format=list(type="object"))
    

    The output LM_models_2 is a virtual object.

    class(LM_models_2)
    [1] "aa.object"
    

    And it can be converted to a standard R list.

    LM_models_2_r<-as.object(LM_models_2)