The examples in this section illustrate the use of aa.tapply(). The function aa.tapply() is used to apply an R function on a partitioned virtual data frame. The data frame is partitioned by the value given in the INDEX argument.
These examples use the dataset "npk" from the MASS package.
-
Create the table in the database.
library(MASS) ta.create(npk, table = 'npk', schemaName = 'myschema', tableType = 'fact', partitionKey = 'block')
-
Create the virtual data frame and update the column names to lowercase.
tadf_peas<-ta.data.frame("npk", schemaName = "myschema") ta.colnames(tadf_peas)<-c("block","n","p","k","yield")
-
Create a function vec_avg to calculate the average yield by block.
vec_avg <- function(vect) { v <- mean(vect) return(v) }
-
Use aa.tapply() to run the function vec_avg, calculating the average yield by block.
r1<-aa.tapply(tadf_peas[,5], FUN = vec_avg, INDEX=tadf_peas$block, out.format=list(type="object"))
The output "r1" is a virtual object (type "aa.object") of length 6.
> r1 $block=6 -------------- [1] 56.35 $block=2 -------------- [1] 57.45 $block=3 -------------- [1] 60.775 $block=4 -------------- [1] 50.125 $block=1 -------------- [1] 54.025 $block=5 -------------- [1] 50.525
> class(r1) [1] "aa.object"
> ta.length(r1) [1] 6
-
Use aa.tapply() to run the function vec_avg, this time partitioning with two columns.
r2<-aa.tapply(tadf_peas[,5], FUN = vec_avg, INDEX=tadf_peas[,c("block","n")], out.format=list(type="object"))
The output "r2" is:> r2 $block=6,$n=0 ------------------- [1] 54.6 $block=2,$n=1 ------------------- [1] 59.15 $block=2,$n=0 ------------------- [1] 55.75 $block=6,$n=1 ------------------- [1] 58.1 $block=4,$n=1 ------------------- [1] 55.4 $block=3,$n=0 ------------------- [1] 58.9 $block=3,$n=1 ------------------- [1] 62.65 $block=4,$n=0 ------------------- [1] 44.85 $block=1,$n=0 ------------------- [1] 48.15 $block=5,$n=1 ------------------- [1] 50.9 $block=5,$n=0 ------------------- [1] 50.15 $block=1,$n=1 ------------------- [1] 59.9